Date | Commit message (Collapse) |
|
If run with PERL_INLINE_DIRECTORY for Inline::C support
along with INBOX_DEBUG=malloc_info, we can allow users
to opt-in to compiling extra code to support the glibc
malloc_info(3) function.
We'll also add SIGCONT handler to dump the malloc_info(3)
output to stderr on our daemons.
|
|
Well, it could probably be moved to contrib...
|
|
The version of Test::More from Perl 5.10.1 did not support
"subtest", and the earliest version which did is Perl 5.12.0
The good news is this gives me an excuse to parallelize
the indexlevels-mirror test by splitting it into two.
(it could be further split, even).
Update t/nntpd. to use PI_TEST_VERSION consistently while
we're at it.
|
|
|
|
Even though we currently don't use it repeatedly, ->Reset
should close() kqueue FDs and not cause the process to run
out of descriptors.
Add a close-on-exec test while we're at it.
|
|
Oops :x
|
|
copydatabase(1) is an existing Xapian tool which is the
recommended way to upgrade existing DBs to the latest Xapian
database format (currently "glass" for stable/released
versions). Our use of Xapian relies on preserving document IDs,
so we'll wrap it like we do xapian-compact(1) and use the
"--no-renumber" switch.
I could not name the tool "public-inbox-copydatabase" since it
would be ambiguous as to which DB it's actually copying. So, I
abbreviated the suffix to "xcpdb" (Xapian CoPy DataBase), which
I hope is acceptable and unambiguous.
|
|
Port public-inbox-compact(1) over to using it, and we will need
to wrap copydatabase(1) to ease glass migrations, too.
|
|
In retrospect, introducing V1Writable was unnecessary and
InboxWritable->importer is in a better position to abstract
away differences between v1 and v2 writers.
So teach InboxWritable to initialize inboxes and get rid
of V1Writable.
|
|
Preventative measures; since marketing is almost always annoying
to me. And trying to avoid unintended consequences.
|
|
We were reindexing the full history every invocation of -index
when Xapian was not used because we were incorrectly relying on
'last_commit' metadata stored in Xapian.
Rewrite the indexing logic to be less confusing while we're
at it, since we rely on `git merge-base --is-ancestor' nowadays.
Furthermore, we need to handle message removals from the
overview index correctly when Xapian is not in use.
Co-authored-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Import initialization is a little strange from history, but we
also can't change it too much because it's technically a public
API which external code may rely on...
And we may need to support v1 repos indefinitely. This should
make it easier to write tests for both formats.
|
|
This should make it easier to test a bunch of package
installation profiles across whatever OS isolation
one chooses (chroots, containers, jails, VMs).
|
|
* origin/danga-bundle:
DS: epoll: fix misordered EPOLL_CTL_DEL call
DS: drop unused "_undef" sub
syscall: drop readahead wrapper
build: do not manify DS and Syscall pods
DS: handle EINTR in IO::Poll path, too
DS: workaround IO::Kqueue EINTR (mis-)handling
DS: drop profiling support
DS: remove unused fields and functions
listener: use EPOLLEXCLUSIVE for listen sockets
bundle Danga::Socket and Sys::Syscall
|
|
* origin/wwwlisting:
www: support listing of inboxes
start depending on Perl 5.10.1+
|
|
These modules are unmaintained upstream at the moment, but I'll
be able to help with the intended maintainer once/if CPAN
ownership is transferred. OTOH, we've been waiting for that
transfer for several years, now...
Changes I intend to make:
* EPOLLEXCLUSIVE for Linux
* remove unused fields wasting memory
* kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615
* accept4 support
And some lower priority experiments:
* switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes)
* nginx-style buffering to tmpfile instead of string array
* sendfile off tmpfile buffers
* io_uring maybe?
|
|
I'm using this as the cgit about-filter and source-filter
in https://80x24.org/public-inbox.git
|
|
Incomplete at the moment, but this ought to be a handy reference
for both implementers and users alike.
|
|
We will still return a 404 by default to '/' for compatibility
with users of Plack::App::Cascade or similar. Inboxes are
sorted by modification times to help users detect activity
(similar to the /$INBOX/ topic view).
New configuration options:
* publicinbox.wwwlisting - configure the listing type
* publicinbox.<name>.hide - hide a particular inbox from the listing
See changes to public-inbox-config.pod for full descriptions
of the new options.
Requested-by: Leah Neukirchen <leah@vuxu.org>
https://public-inbox.org/meta/871sdfzy80.fsf@gmail.com/
|
|
We depend on git-http-backend for smart HTTP clone support,
however; since cgit does not support smart clones natively.
WWW.pm will be able to cascade down to this as a 404 handler in
the future.
|
|
Fixes: 285b9b4d7de53b0d ("examples/newswww.psgi: demonstrate standalone NewsWWW usage")
|
|
This is the fallback for the normal WWW endpoint.
Adding this to the top-level seems to be alright, since lynx and
w3m both understand nntp://<HOSTNAME>/<Message-ID> anyways.
If newsgroup and inbox names conflict, then consider it the
fault of the original sender.
Since NewsWWW is intended to support buggy linkifiers in mail clients,
they can interpret nntp:// URLs as http://<HOSTNAME>/<Message-ID>
Inbox ordering from the config file is preserved since
commit cfa8ff7c256e20f3240aed5f98d155c019788e3b
("config: each_inbox iteration preserves config order"),
so admins can rely on that to configure how scanning
works.
Requested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
cf. https://public-inbox.org/meta/20190107190719.GE9442@pure.paranoia.local/
nntp://news.public-inbox.org/20190107190719.GE9442@pure.paranoia.local
|
|
* origin/purge:
implement public-inbox-purge tool
v2writable: read epoch on purge
v2writable: cleanup processes when done
v2writable: purge ignores non-existent git epoch directories
v2writable: ->purge returns undef on no-op
import: purge: reap fast-export process
hoist out resolve_repo_dir from -index
|
|
I'll probably expose the PSGI service for cgit;
but it could be useful to others as well.
|
|
Since we now support more CSS classes for coloring,
give this feature more visibility.
|
|
Maybe we'll default to a dark theme to promote energy savings...
See contrib/css/README for details
|
|
|
|
This will lookup git blobs from associated git source code
repositories. If the blobs can't be found, an attempt to
"solve" them via patch application will be performed.
Eventually, this may become the basis of a type-agnostic
frontend similar to "git show"
|
|
This will be necessary to ensure we maintain reasonable
performance when we add diff-highlighting support.
|
|
Expose the ->purge functionality of V2Writable for rewriting
git history to permanently purge messages from history. This
may be necessary for legal reasons.
Usage:
# requires ~/.public-inbox/config
public-inbox-purge --all </path/to/message-to-purge
# good for testing with unconfigured inboxes:
public-inbox-purge $INBOX_DIR </path/to/message-to-purge
|
|
We'll be using it in future admin tools, and making this
easier-to-test.
|
|
Clearly the AltId stuff was never tested for v2. Ensure
this tricky filter (which reuses Msgmap to avoid introducing
new serial numbers) doesn't trigger deadlocks SQLite due
to opening a DB for writing multiple times.
I went through several iterations of this change before
going with this one, which is the least intrusive I could
fine.
|
|
Remove confusing documentation around ssoma now that we
have NNTP and downloadable mbox support.
Only lightly-checked for grammar and speling, and not yet
formatting. Edits, corrections and addendums expected :>
|
|
I've found two examples on https://lore.kernel.org/lkml/
where the messages declared themselves to be "multipart/mixed"
but were actually plain text:
<87llgalspt.fsf@free.fr>
<200308111450.h7BEoOu20077@mail.osdl.org>
With the mboxrd downloaded, mutt is able to view them without
difficulty.
Note: this change would require reindexing of Xapian to pick up
the changes. But it's only two ancient messages, the first was
resent by the original sender and the second is too old to be
relevant.
|
|
Extracted from import_slrnspool, since some spools get converted
to mbox or what not.
|
|
This reuses some of the configuration from -watch, but remains
independent since some configurations will use -watch for some
inboxes and -mda for others.
The default remains "spamc" for -mda users so nothing changes
without explicit configuration.
Per-inbox configurations may also be supported in the future.
|
|
We must not clobber the original message string, as Email::MIME(*)
still needs it for iterating through parts in SearchIdx (but not
when handing it as a raw string to git-fast-import).
I've noticed message bodies (especially dfpre/dpost) were not
getting indexed when going through -mda (no problems with
-watch). This also did not affect v1 repos, since indexing is a
separate process for v1 and requires re-reading the data from
git.
(*) tested Email::MIME 1.937 on Debian stretch
|
|
Decrement regen_down when visiting messages that appear in %D that we
know will later be deleted. This ensures consistent message numbers are
generated no matter which commit number is on top. Allowing deletes to
propagage separately from the messages they delete without causing
problems.
The v2 trees already do this and when the indexes are deleted and
rebuilt they maintain they commit numbers.
Add a v1 version of the v2reindex test to verify that reindexing is
working properly on v1 as well as v2.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Recently I ran git --git-dir=lkml/git/1.git fsck
and it reported:
> warning in commit 299dbd50b6995c6debe2275f0df984ce697fb4cc: nulInCommit: NULL byte inthe commit object body
Which I found quite scary. Nulls in the wrong place have a bad tendency
to make programs misbehave.
It turns out someone had placed "=?iso-8859-1?q?=00?=" at the end of
their subject line. Which is the mime encoding for NULL. Email::Mime
had correctly decoded the header, and then public-inbox had simply
copied the contents of the header into the subject line of the git
commit.
To prevent that from causing problems replace nulls in such subject
lines with spaces.
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
|
|
Recently I had trouble cloning lkml/git/0.git because
git fsck on receive was failing. The output of git fsck was:
> Checking object directories: 100% (256/256), done.
> warning in commit 59173dc1fe67b113ace4ce83e7f522414b3e0404: badTimezone: invalid author/committer line - bad time zone
> warning in commit ff22aaff22eb4479e49e93f697e385f76db51c55: badTimezone: invalid author/committer line - bad time zone
> warning in commit 609b744909693f5f00aff5ed9928beeeee9ded2e: badTimezone: invalid author/committer line - bad time zone
> warning in commit 084572141db8e0d879428afb278bd338f2dbb053: badTimezone: invalid author/committer line - bad time zone
> warning in commit 789d204de27cd12c6da693d903390a241a1a4bca: badTimezone: invalid author/committer line - bad time zone
> warning in commit 0d9a65948b0c957007ca387cd56b690f9bab9c08: badTimezone: invalid author/committer line - bad time zone
> warning in commit f7468c42b4196ee6323afb373ab9323971c38d69: badTimezone: invalid author/committer line - bad time zone
> warning in commit 85e0cd6dd527cd55ad0440f14384529b83818228: badTimezone: invalid author/committer line - bad time zone
> warning in commit f31e19a2e772c9ed00728ef142af9c550ea5de6a: badTimezone: invalid author/committer line - bad time zone
> warning in commit 56eb7384443ef84e17e29504a304a071b189ae67: badTimezone: invalid author/committer line - bad time zone
> warning in commit e4470030471e6810414b9de5e3b52e16f2245d12: badTimezone: invalid author/committer line - bad time zone
> warning in commit f913b48caa097c3b2cb3f491707944f88d52d89f: badTimezone: invalid author/committer line - bad time zone
> warning in commit 4390f26923d572c6dab6cce8282c7cad5520d785: badTimezone: invalid author/committer line - bad time zone
> warning in commit 0f66db71a06bd7d651a0cd80877d8043b70fda20: badTimezone: invalid author/committer line - bad time zone
> warning in commit d71472c40b36dcdf0396afc9778f6137eea45887: badTimezone: invalid author/committer line - bad time zone
> warning in commit e8d3b19a91a2d86b6a91bd19dc811e851398b519: badTimezone: invalid author/committer line - bad time zone
> warning in commit afd9fc0cc87e56ed7736d633e17d0ef77817b3cc: badTimezone: invalid author/committer line - bad time zone
> warning in commit 811b3217708358cf1b75fba4602a64a426fce0f5: badTimezone: invalid author/committer line - bad time zone
> warning in commit e7a751a597c6f5e4770c61bdee6220d55a37cba9: badTimezone: invalid author/committer line - bad time zone
> warning in commit 3e32ad6192fe093e03e6b9346c3a90b16d9905c0: badTimezone: invalid author/committer line - bad time zone
> warning in commit 5e66b47528e79d3bbb769e137f036a1fa99cccf9: badTimezone: invalid author/committer line - bad time zone
> warning in commit d90d67d94ca47142670dff13fcb81ab7afab07bb: badTimezone: invalid author/committer line - bad time zone
> Checking objects: 100% (1711464/1711464), done.
> Checking connectivity: 1711464, done.
Upon examination with git show --pretty=raw all of the problem commits
had a time zone that was not 4 digits long. This time zone had been
passed straight from the Date line in the email into the author line
of the commit.
Looking into that I discovered that str2time takes into account the
time zone, and was actually able to process these weird time zones.
So get the normalized time zone with strptime and convert it from
seconds from gmt to hours and minutes from gmt.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Followup-to: 73cfed86d8a8287a
("www: use undecoded paths for Message-ID extraction")
Reported-by: Leah Neukirchen <leah@vuxu.org>
https://public-inbox.org/meta/8736xsb5s5.fsf@vuxu.org/
|
|
This adds a SELinux policy suitable for RHEL/CentOS 7. It assumes the
following:
- public-inbox-httpd and public-inbox-nntpd are running via systemd
on sane ports (119 and 80/8080)
- /var/lib/public-inbox is the location for mainrepos
- /var/run/public-inbox is the location for PERL_INLINE_DIRECTORY
- /var/log/public-inbox is the location for logs
- mail delivery is done via postfix-pipe or public-inbox-watch via
the provided example systemd service
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
|
|
I guess I forgot to include this, but I've been running
public-inbox-watch as a systemd service for nearly two
years, now.
|
|
Some messages to git@vger went missing from Msgmap from old bugs
and became inaccessible via NNTP. Forcing NNTP article numbers
when the overview DB came about made the problem more visible when
reindexing old (v1) repositories as all removed spam messages
took up AUTOINCREMENT numbers again before they were removed.
Having large gaps in NNTP article numbers is not good since it
throws off NNTP clients. This does NOT prevent NNTP clients from
seeing some messages twice, but is better than having them
miss several messages entirely.
We also avoid depending on --reverse in git-log, as
git requires storing an entire commit list in memory for
--reverse, so it's cheaper to store only deleted blobs in the %D
hash since they do not live long.
|
|
While hunting duplicates, I noticed a leading '-' in some
Message-IDs as a result of RFC4648 encoding. While '-' seems
allowed by RFC5322 and URL-friendly (RFC4648), they are uncommon
and make using Message-IDs as arguments for command-line tools
more difficult. So prefix them with a datestamp to at least
give readers some sense of the age. And shorten the "localhost"
hostname to "z" to save space.
|
|
Since the overview stuff is a synchronization point anyways,
move it into the main V2Writable process and allow us to
drop a bunch of code. This is another step towards making
Xapian optional for v2.
In other words, the fan-out point is moved and the Xapian
partitions no longer need to synchronize against each other:
Before:
/-------->\
/---------->\
v2writable -->+----parts----> over
\---------->/
\-------->/
After:
/---------->
/----------->
v2writable --> over-->+----parts--->
\----------->
\---------->
Since the overview/threading logic needs to run on the same core
that feeds git-fast-import, it's slower for small repos but is
not noticeable in large imports where I/O wait in the partitions
dominates.
|
|
There's enough gmane links out there in wild that it makes sense
to maintain support for these mappings.
|
|
This is important for people running mirrors via "git fetch",
as they need to be kept up-to-date. Purging is also now
supported in mirrors.
The short-lived "--regenerate" option is gone and is now
implicitly enabled as a result. It's still cheap when
article number regeneration is unnecessary, as we track
the range for each git repository.
|
|
While SQLite is faster than Xapian for some queries we
use, it sucks at handling OFFSET. Fortunately, we do
not need offsets when retrieving sorted results and
can bake it into the query.
For inbox.comp.version-control.git (v1 Xapian),
XOVER and XHDR are over 20x faster.
|
|
There'll be more performance-related tests in the future.
|