Date | Commit message (Collapse) |
|
Large string processing + concurrency + caching/memoization
really brings out the worst in glibc malloc :<
|
|
|
|
I may be mistaken, but I suspect the reason jemalloc handles
long-lived processes better than glibc is due to granularity
reduction being scaled to larger size classes. This can waste
20% of an individual allocation, but increases the likelyhood
of reuse (without splitting/consolidating into other sizes).
In other words, glibc seems to try too hard to make the best fit
for initial allocations. This ends up being suboptimal over
time as those allocations are freed and similar (but not
identical) allocations come in. jemalloc sacrifices the best
initial fit for better fits over a long process lifetime.
|
|
This can be a multi-process daemon, but systemd should only kill
the top-level one. And also finish a comment about the User
having access to the shared private key.
|
|
We don't want the milter to munge List-Unsubscribe headers from
external (incoming) mlmmj lists, only lists hosted on the server
running unsubscribe.milter.
Adding support for an allow_domains file should've been enough,
but this further restricts the milter to only operating on Postfix
connections from localhost.
|
|
The whitelist was only used in the early days of its development
and hasn't existed for a while. I've largely forgotten this
thing exists since it's been working well...
|
|
There's no need to send SIGUSR1 to auxiliary processes since
they don't know what to do with them.
|
|
systemd complains about `User=nobody' since `nobody' has access
to all files which can't be mapped to a valid UID. We'll also
switch to `Group=ssl-cert' since that ought to be able to read
TLS certificates.
|
|
|
|
systemd (247.3-7+deb11u1 on Debian 11.x) considers them "obsolete" and
emits the following to my syslog:
Standard output type syslog is obsolete, automatically updating to journal.
Please update your unit file, and consider removing the setting altogether.
So we'll remove it altogether, as I'm sticking with rsyslog for now.
|
|
public-inbox-httpd has always been designed to handle slow
clients efficiently via non-blocking sockets and epoll|kqueue.
Thus the proxy buffering capabilities of nginx were a needless
waste of memory and filesystem traffic and increases response
latency.
nginx does provide an HTTPS-capable reverse-proxy to talk to
varnish, however, any other HTTPS-capable reverse proxy works,
too.
|
|
This is like more familiar to readers of TAP (Test Anywhere
Protocol) output, as well as shell and Perl scripters which also
use `#' for comments.
AFAIK, nobody is parsing our stderr, and I'm not sure how
standardized the `I:' prefix is (nor `W:' and `E:' are). It's
already the prevailing style in Lei* code, too, so things have
been moving in that direction for a bit.
|
|
It's important show that a single systemd service and socket file
can replace all other read-only daemons for ease-of-management.
|
|
systemd.socket(5) files can actually contain multiple listen
sockets, so shave down inode overhead and simplify config
file management by consolidating all applicable ports into
a single file for each daemon.
|
|
Having old, unmaintained docs for other HTTP servers is likely
harmful at this point. public-inbox-httpd is specifically
designed to handle git repos on slow storage and stream giant
mbox.gz files fairly to slow clients.
|
|
This allows unambiguous signaling to some MUAs and webmail clients
that th List-Unsubscribe header contains an instantaneous
unsubscribe option.
|
|
Sendmail::PMilter requires an IO::Socket object, not a GLOB.
Fixes: e901a56b3b30b22f (treewide: favor open(..., '+<&=', $fd), 2021-05-21)
|
|
While git respects a user's local timezone and returns
seconds-since-the-Epoch, we were unnecessarily and incorrectly
calling gmtime+strftime on its result. So ignore calling
gmtime+strftime when the strftime format is "%s", just feed
the output time from git directly to Xapian.
This is mainly for lei, which will likely run in a variety of
timezones. While we're at it, add a recommendation to use
TZ=UTC in public-inbox-httpd, in case there are (misguided :P)
sysadmins who set a non-UTC TZ.
|
|
Cut down on unnecessary imports of IO::Handle and
method lookup + dispatch overhead.
|
|
<tt> doesn't seem necessary and it's deprecated in HTML, nowadays.
In any case, dillo's CSS support seems to show it as fixed-width
even without <tt>. Use the title= attribute to highlight that
it goes to the mail thread, too.
In the future, we'll probably link to something like "lei p2q"
(patch-to-query) to include OIDs in the search.
|
|
Our HTTP daemon is `public-inbox-httpd', not
`public-inbox-http'.
|
|
With an example such as:
something before "quoted phrase" something after
The Xapian will now see:
[ "something before", "quoted phrase", "something after" ]
whereas before it would see:
[ "something before", "quoted", "phrase", "something after" ]
which should improve search results accuracy when looking
up commits by commit title (subject).
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
Unlike DBD::SQLite, the sqlite3(1) CLI does not have a default
busy timeout enabled, so it easily times out while acquiring a
SHARED lock for read-only queries. We can avoid battery-wasting
polling from the SQLite timeout handler by relying on flock(2)
as we do in our Perl code.
Furthermore, this avoids triggering some locking problems[1]
from a long "SELECT COUNT(*) ..." query and reindex.
While there may be other SQLite-related parallelism issues[1],
this works around one of them by relying on flock(2).
[1] https://public-inbox.org/meta/20200825001204.GA840@dcvr/
|
|
We've got examples for all the other daemons, too!
|
|
--sequential-shard offers better performance on HDD than -j0
since the on-disk active set can be kept small (with -j $HIGH_NUM).
--batch-size can also be helpful for systems with much RAM.
|
|
I finally noticed descriptions weren't showing up in my mirrors :x
|
|
grok-pull is still painful with serialization on an old USB 2.0
HDD, but at least it can finish with flock(1) and disabling
parallelization. While parallel "git fetch" doesn't seem so
bad, slow seeks are exacerbated by parallel reads in Xapian.
That means some updates can take days instead of hours. The
same updates take only seconds or minutes on an SSD.
|
|
Instead of gzipping some (mbox.gz, manifest.js.gz) responses and
leaving P::M::D to do the rest, we gzip everything ourselves,
now, so P::M::D is redundant.
|
|
Users are encouraged to edit this script, anyways, so make it
easy for them to swap out and use whatever URL they need.
|
|
The value of infourl parameters are shared in the config, so
include them in the mirror.
|
|
The $INBOX_URL/description endpoint is available since v1.3.0,
so use it in mirrors.
|
|
public-inbox-httpd should work with any PSGI files, so make
it more apparent to people reading .psgi examples.
|
|
It was the only file in our tree which had CRLF line endings,
so make it consistent with the rest.
|
|
I didn't wait until September to do it, this year!
|
|
Instead of providing a generic "mailto:foo+unsubscribe@example.com"
address in List-Unsubscribe which requires confirmation, replace it
with a mailto: header with a unique subject which contains the same
unique ID we put in the https:// URL.
This makes it easier for some MUAs without https:// support to
unsubscribe with a single action via the List-Unsubscribe header.
|
|
Mail to gmane is being delivered to gmane-mx.org, nowadays, and
we don't want ordinary readers to be able to trigger unconfirmed
unsubscription off any mailing lists which go through our
unsubscribe.milter.
https://lars.ingebrigtsen.no/2020/01/06/whatever-happened-to-news-gmane-org/
|
|
This is necessary for Filesys::Notify::Simple 0.13 using
Linux::Inotify2, since 0.13 started croaking on
inotify_add_watch failures.
|
|
We need to account for both the old ("mainrepo") and new
("inboxdir") names. But "dir" was just a search+replace
error and we don't use that outside of "coderepo.dir".
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
Move away from using "mainrepo" since it's confusing to
new users, especially with v2.
|
|
This requires the latest (to be in 1.2) -init changes for
synchronization and has no dependencies on GNU or bash-isms
so it should run on *BSD systems without GNU tools.
It does attempt to use curl on <$INBOX_URL/_/text/config/raw>,
but curl is fairly standard nowadays, and falls back to using
an invalid address to initialize.
|
|
NNTPS and STARTTLS seems to be working for several months
without incident on news.public-inbox.org, so consider it a
success and maybe others can try using it.
HTTPS technically works, too, but isn't documented at
the moment since I can't recommend production deployments
without varnish protecting it.
|
|
|
|
For users running multiple (-nntpd@1, -nntpd@2) instances of
either -httpd or -nntpd via systemd to implement zero-downtime
restarts; it's possible for a listen socket to become blocking
for a moment during an accept syscall and cause a daemons to
get stuck in a blocking accept() during
PublicInbox::Listener::event_step (event_read in previous
versions).
Since O_NONBLOCK is a file description flag, systemd clearing
O_NONBLOCK momentarily (before PublicInbox::Listener::new
re-enables it) creates a window for another instance of our
daemon to get stuck in accept().
cf. systemd.service(5)
|
|
The sample configuration can be used to proxy-pass requests
to public-inbox-httpd or to a standalone PSGI/Plack server.
|
|
Deflating responses may be done by the reverse proxy (e.g. varnish
or nginx), so the warning for it could be invalid.
|
|
It's been a while since I wrote this, and it needs to be kept
up-to-date with some advances in our Perl code.
|
|
I'm using this as the cgit about-filter and source-filter
in https://80x24.org/public-inbox.git
|