Date | Commit message (Collapse) |
|
The packaged Perl on OpenBSD i386 supports 64-bit file offsets
but not 64-bit integer support for 'q' and 'Q' with `pack'.
Since servers aren't likely to require lock files larger than
2 GB (we'd need an inbox with >2 billion messages), we can
workaround the Perl build limitation with explicit padding.
File::FcntlLock isn't packaged for OpenBSD <= 7.4 (but should be
in future releases), but I can test i386 OpenBSD on an extremely
slow VM.
Big endian support can be done, too, but I have no idea if
there's 32-bit BE users around nowadays...
|
|
The good news (compared to lei) is we only have to worry about
imports and don't care about the filename nor keywords, so it's
immune to .mh_sequences writing inconsistencies across MH
implementations and sequence number packing.
We still assume the writer will write the mail file with one of:
* rename(2) to create the final sequence number filename
* a single write(2) if not relying on rename(2)
mlmmj and mutt satisfy these requirements. Python's Lib/mailbox.py
may, I'm not sure...
|
|
The MH format is widely-supported and used by various MUAs such
as mutt and sylpheed, and a MH-like format is used by mlmmj for
archives, as well. Locking implementations for writes are
inconsistent, so this commit doesn't support writes, yet.
inotify|EVFILT_VNODE watches aren't supported, yet, but that'll
have to come since MH allows packing unused integers and
renaming files.
|
|
This is a step towards improving the out-of-the-box experience
in achieving notifications without XS, extra downloads, and .so
loading + runtime mmap overhead.
This also fixes loongarch support of all Linux syscalls due to
a bad regexp :x
All the reachable Linux architectures listed at
<https://portal.cfarm.net/machines/list/> should be supported.
At the moment, there appears to be no reachable sparc* Linux
machines available to cfarm users.
Fixes: b0e5093aa3572a86 (syscall: add support for riscv64, 2022-08-11)
|
|
The HTML is still extremely rough, but links seem to be mostly
working...
|
|
The C++ version will allow us to take full advantage of Xapian's
APIs for better queries, and the Perl bindings version can still
be advantageous in the future since we'll be able to support
timeouts effectively.
|
|
It ought to help a bit with organization since xap_helper.h
is getting somewhat large and we'll need new endpoints to
support WWW, lei, and whatever else that needs to come.
|
|
The association data is just stored as deflated JSON in Xapian
metadata keys of shard[0] for now. It should be reasonably
compact and fit in memory for now since we'll assume sane,
non-malicious git coderepo history, for now.
The new cindex-join.t test requires TEST_REMOTE_JOIN=1 to be
set in the environment and tests the joins against the inboxes
and coderepos of two small projects with a common history.
Internally, we'll use `ibx_off', `root_off' instead of `ibx_id'
and `root_id' since `_id' may be mistaken for columns in an SQL
database which they are not.
|
|
This seems like a easy (but WWW-specific) way to get recently
created and recently active topics as suggested by Konstantin.
To do this with Xapian will require a new columns and
reindexing; and I'm not sure if the current lei handling of
search results by dumping results to a format readable by common
MUAs would work well with this. A new TUI may be required...
Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20231107-skilled-cobra-of-swiftness-a6ff26@meerkat/
|
|
This is pretty convenient way to create files for diff
generation in both WWW and lei. The test suite should also be
able to take advantage of it.
|
|
This fixes two major problems with the use of tie for filehandles:
* no way to do fcntl, stat, etc. calls directly on the tied handle,
forcing callers to use the `tied' perlop to access the underlying
IO::Handle
* needing separate classes to handle blocking and non-blocking I/O
As a result, Git->cleanup_if_unlinked, InputPipe->consume,
and Qspawn->_yield_start have fewer bizzare bits and we
can call `$io->blocking(0)' directly instead of
`(tied *$io)->{fh}->blocking(0)'
Having a PublicInbox::IO class will also allow us to support
custom read buffering which allows inspecting the current state.
|
|
Now that psgi_yield is used everywhere, the more complex
psgi_return and it's helper bits can be removed. We'll also fix
some outdated comments now that everything on psgi_return has
switched to psgi_yield. GetlineResponse replaces GetlineBody
and does a better job of isolating generic PSGI-only code.
|
|
This is intended to replace psgi_return and HTTPD/Async
entirely, hopefully making our code less convoluted while
maintaining the ability to handle slow clients on
memory-constrained systems
This was made possible by the philosophy shift in commit 21a539a2df0c
(httpd/async: switch to buffering-as-fast-as-possible, 2019-06-28).
We'll still support generic PSGI via the `pull' model with a
GetlineResponse class which is similar to the old GetlineBody.
|
|
A pipe requires more context switches, syscalls, and code to
deal with unpredictable pipe EOF vs waitpid ordering. So just
use the new spawn/aspawn features to automatically handle
slurping output into a string.
|
|
It's slightly better organized this way, especially since
`publicinboxLimiter' has its own user-facing config section
and knobs. I may use it in LeiMirror and CodeSearchIdx for
process management.
|
|
This is required for reliable epoll/kevent/poll/select
wakeup notifications, since we have no visibility into
the buffer states used internally by Perl.
We can safely use sysread here since we never use the :utf8
nor any :encoding Perl IO layers for readable pipes.
I suspect this fixes occasional failures from t/solver_git.t
when retrieving the WwwCoderepo summary.
|
|
Since we deal with pipes (of either direction) and bidirectional
stream sockets for this class, it's better to remove the `Pipe'
from the name and replace it with `IO' to communicate that it
works for any form of IO::Handle-like object tied to a process.
|
|
It's safer against deadlocks and we still get proper error
reporting by passing stderr across in addition to the lei
socket.
|
|
`initial_limit' only affects the fetch for new users while
`limit' affects all users. `initial_limit' is intended to be
better than the existing, absolute `limit' for 24/7 servers
(e.g. scheduled POP3 imports via webmail). The existing `limit'
parameter remains and is best suited for systems with sporadic
Internet access.
This also fixes an offset calculation bug with limit when used
on the newest (non-full) slice. The number of messages in the
newest slice is now taken into account.
Link: https://public-inbox.org/meta/20230918-barrel-unhearing-b63869@meerkat/
Fixes: 392d251f97d4 (pop3: support `?limit=$NUM' parameter in mailbox name)
|
|
This allows us to get rid of some duplication in our Makefile
|
|
This non-parallelized target is useful for noticing core dumps
and preventing them from being clobbered as we run the test
suite. It will also notice leftover temporary files and
directories.
This make target was used on OpenBSD 7.3 to develop at least two
recent fixes:
e281363ba937 (lei: ensure we run DESTROY|END at daemon exit w/ kqueue)
759885e60e59 (lei: ensure --stdin sets %ENV and $current_lei)
I considered using a per-test TMPDIR for this to enable
parallelization, but on-filesystem UNIX sockets won't work
with excessively long path names.
|
|
deps.perl can be useful for non-CI purposes as long as it's not
blindly removing packages. Thus, a --allow-remove flag now
exists for CI use and removals are disabled by default.
deps.perl also gets easier-to-use in that now install/os.perl
is split off from from ci/profiles.perl so OS-supplied packaged
manager.
|
|
This is safer than relying on an internal API of IO::Poll
and doesn't create extra references to IO globs like the
public one.
|
|
This allows us to cut down on imports and reduce code.
This also makes it easier (in the next commit) to provide an option
to disable epoll/kqueue when saving an FD is valued over scalability.
|
|
Reading os-release(5) is a bit more painful, now; and still
requires using the shell. However, sharing code between *BSDs
and being able to use v-strings for version comparisons is much
easier.
Test profiles for *BSDs are also trimmed down and more focused
on portability stuff.
|
|
We use it to dump SIGWINCH and _SC_NPROCESSORS_ONLN, so
"sysdefs" is a more appropriate list for *BSD users.
|
|
This is no longer needed since xap_helper performs its
functionality while having an optional C++ implementation
which is being significantly faster.
|
|
It's now just `dump_roots' instead of `dump_shard_roots', since
this doesn't need to be tied to the concept of shards. I'm
still shaky with C++, but intend to keep using stuff like
hsearch(3) to make life easier for C hackers :P
|
|
This allows us to perform the expensive "dump_ibx" operations in
native C++ code using the Xapian C++ library. This provides the
majority of the speedup with the -cindex --associate switch.
Eventually this may be expanded to cover all uses of Xapian
within the project to ensure we have access to Xapian APIs which
aren't available in XS|SWIG bindings; and also for
ease-of-installation on systems which don't provide
pre-packaged Perl Xapian bindings (e.g. OpenBSD 7.3) but
do provide Xapian development libraries.
Most of the C++ code is still C, as I'm not remotely familiar
with C++ compared to C. I suspect many users and potential
hackers being from git, Linux kernel, and glibc world are in the
same boat.
|
|
This will eventually allow associating coderepos with inboxes
and vice-versa; avoiding the need for manual configuration via
tedious publicinbox.*.coderepo directives.
I'm not sure how this should be stored for WWW, yet, but it's
required since it takes about 8 hours to do this fully across
lore and git.kernel.org.
|
|
It's similar to a combination of -index and -extindex but
perhaps more refined this time around...
|
|
Buffered readline (and read) ops under Perl 5.36.0 fails to read
new data after writes are made by other file handles (or
processes).
To fix and improve our test, introduce a new, (currently)
test-only TailNotify class to use inotify or kevent if available
to workaround it while avoiding infinite polling loops. Further
refinements to these test APIs since we use the same pattern for
testing daemons in many places.
This also fixes the TEST_KILL_IMAPD condition in t/imapd.t under
GNU/Linux, AFAIK that test was never reliable under FreeBSD.
Link: https://bugs.debian.org/1040947
|
|
This will make it easier to switch in the far future while
making callers easier-to-read (and more callers will be added).
Anyways, Perl 5.26 is a long time away for enterprise users;
but isolating compatibility code away can improve readability
of code we actually care about in the meantime.
|
|
With my partial git.kernel.org mirror, this brings a full prune
down from ~75 minutes to under 5 minutes using git 2.19+. This
speedup even applies to users on slow storage (rotational HDD).
First off, xapian-delve(1) is nearly 10x faster for dumping
boolean terms by prefix than the equivalent Perl code with
Xapian bindings. This performance difference is critical since
we need to check over 5 million commits for pruning a partial
git.kernel.org mirror.
We can use sed(1) and sort(1) to massage delve output into
something suitable for the first comm(1) input.
For the second comm(1) input, the output of `git cat-file
--batch-check --batch-all-objects' against all indexed git repos
with awk(1) filtering provides the necessary output for
generating a list of indexed-but-no-longer accessible commits.
sed(1) and awk(1) are POSIX standard tools which can be roughly
2x faster than equivalent Perl for simple filters, while
sort(1) is designed to handle larger-than-memory datasets
efficiently (unlike the `sort' perlop).
With slow storage and git <2.19, the switch to --batch-all-objects
actually results in a performance regression since having git
perform sorting results in worse disk locality than the previous
sequential iteration by Xapian docid. git 2.19+ users with
`--unordered' support benefits from improved storage locality;
and speedups from storage locality dwarfs the extra overhead of
an extra external sort(1) invocation.
Even with consumer-grade SATA-II SSDs, the combo of --unordered
and sort(1) provides a noticeable speedup since SSD latency
remains a factor for --batch-all-objects.
git <2.19 users must upgrade git to get acceptable performance
on slow storage and giant indexes, but git 2.19 was released
nearly 5 years ago so it's probably a reasonable requirement for
performance.
The only remaining downside of this change for all users
the extra temporary disk space for sort(1) and comm(1);
but the speedup provided with git 2.19+ is well worth it.
|
|
Since CodeSearchIdx doesn't deal with inboxes, it makes sense
to split it out from inbox-specific code and start moving
towards using OnDestroy to restore the umask at the end of
scope and reducing extra functions.
|
|
`git log -p' can several seconds to generate its initial output.
SMP systems can be processing prunes during this delay, so let
DS do a one-shot notification for us while prune is running. On
Linux, we'll also use the biggest pipe possible so git can do
more CPU-intensive work to generate diffs while our Perl
processes are indexing and likely hitting I/O wait.
|
|
As encountered by Louis DeLosSantos, Linux inotify is capped by
a lesser-known limit than the standard RLIMIT_NOFILE (`ulimit -n`)
value. Give the user a hint about the fs.inotify.max_user_instances
sysctl knob on EMFILE, since EMFILE alone may mislead users into
thinking they've hit the (typically higher) RLIMIT_NOFILE limit.
I can test this on my system using:
perl -I lib -MPublicInbox::Inotify -E \
'my @x = map { PublicInbox::Inotify->new } (1..128)'
But I hesitate to include it in the test suite since triggering
the limit can cause unrelated processes to fail.
Link: https://public-inbox.org/meta/CAE6jdTo8iQfNM9Yuk0Dwi-ARMxmQxX-onL8buXcQ9Ze3r0hKrg@mail.gmail.com/
Reported-by: Louis DeLosSantos <louis.delos@gmail.com>
|
|
It seems relying on root commits is a reasonable way to
deduplicate and handle repositories with common history.
I initially wanted to shoehorn this into extindex, but decided a
separate Xapian index layout capable of being EITHER external to
handle many forks or internal (in $GIT_DIR/public-inbox-cindex)
for small projects is the right way to go.
Unlike most existing parts of public-inbox, this relies on
absolute paths of $GIT_DIR stored in the Xapian DB and does not
rely on the config file. We'll be relying on the config file to
map absolute paths to public URL paths for WWW.
|
|
Hopefully this makes things less surprising to new hackers.
|
|
I'll have to downsize the server due to increased hosting costs,
so stop advertising these mirrors.
The inboxes still exist, for now; but will probably be proxied
behind an ssh tunnel via slow DSL connection, but it's not worth
increasing traffic to.
|
|
t/run.perl now prints slowest 10 tests at startup, and I've
added ./devel/longest-tests to print all tests sorted by
elapsed time.
This should allow us to notice outliers more quickly in the
future.
|
|
On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as
the Digest::SHA implementation from Perl, most likely due to an
optimized assembly implementation. SHA-1 is a few percent
faster, too.
|
|
This is for compatibility with cgit to ease migration.
|
|
These will be reused in the web UI, too.
|
|
This should be similar or identical to what's in cgit;
and tie into the rest of the www_coderepo stuff.
|
|
This should be compatible with both grokmirror 1 and 2 behavior
and serialized on a per-repo basis.
|
|
It's not much, yet, but it's something for the corner cases
which I'm maybe not hitting under normal use.
|
|
I'm thinking the -nntpd regression fix will push this release
out sooner rather than later...
|
|
These should be compatible with cgit results
|
|
This will allow it to easily map a single coderepo to multiple
inboxes (or multiple coderepos to any number of inboxes).
For now, this is just a summary, but $REPO/$OID/s/ support
will be added, along with archive downloads.
Indexing of coderepos will probably be supported via -extindex,
only.
|