about summary refs log tree commit homepage
path: root/MANIFEST
DateCommit message (Collapse)
2024-02-06pop3d: support fcntl locks on OpenBSD i386
The packaged Perl on OpenBSD i386 supports 64-bit file offsets but not 64-bit integer support for 'q' and 'Q' with `pack'. Since servers aren't likely to require lock files larger than 2 GB (we'd need an inbox with >2 billion messages), we can workaround the Perl build limitation with explicit padding. File::FcntlLock isn't packaged for OpenBSD <= 7.4 (but should be in future releases), but I can test i386 OpenBSD on an extremely slow VM. Big endian support can be done, too, but I have no idea if there's 32-bit BE users around nowadays...
2024-01-30watch: support incremental updates from MH
The good news (compared to lei) is we only have to worry about imports and don't care about the filename nor keywords, so it's immune to .mh_sequences writing inconsistencies across MH implementations and sequence number packing. We still assume the writer will write the mail file with one of: * rename(2) to create the final sequence number filename * a single write(2) if not relying on rename(2) mlmmj and mutt satisfy these requirements. Python's Lib/mailbox.py may, I'm not sure...
2023-12-30lei: support reading MH for convert+import+index
The MH format is widely-supported and used by various MUAs such as mutt and sylpheed, and a MH-like format is used by mlmmj for archives, as well. Locking implementations for writes are inconsistent, so this commit doesn't support writes, yet. inotify|EVFILT_VNODE watches aren't supported, yet, but that'll have to come since MH allows packing unused integers and renaming files.
2023-12-29pure Perl inotify support
This is a step towards improving the out-of-the-box experience in achieving notifications without XS, extra downloads, and .so loading + runtime mmap overhead. This also fixes loongarch support of all Linux syscalls due to a bad regexp :x All the reachable Linux architectures listed at <https://portal.cfarm.net/machines/list/> should be supported. At the moment, there appears to be no reachable sparc* Linux machines available to cfarm users. Fixes: b0e5093aa3572a86 (syscall: add support for riscv64, 2022-08-11)
2023-11-29www: start working on a repo listing
The HTML is still extremely rough, but links seem to be mostly working...
2023-11-29xap_helper: implement mset endpoint for WWW, IMAP, etc...
The C++ version will allow us to take full advantage of Xapian's APIs for better queries, and the Perl bindings version can still be advantageous in the future since we'll be able to support timeouts effectively.
2023-11-29xap_helper.h: move cindex endpoints to separate file
It ought to help a bit with organization since xap_helper.h is getting somewhat large and we'll need new endpoints to support WWW, lei, and whatever else that needs to come.
2023-11-21cindex: rename --associate to --join, test w/ real repos
The association data is just stored as deflated JSON in Xapian metadata keys of shard[0] for now. It should be reasonably compact and fit in memory for now since we'll assume sane, non-malicious git coderepo history, for now. The new cindex-join.t test requires TEST_REMOTE_JOIN=1 to be set in the environment and tests the joins against the inboxes and coderepos of two small projects with a common history. Internally, we'll use `ibx_off', `root_off' instead of `ibx_id' and `root_id' since `_id' may be mistaken for columns in an SQL database which they are not.
2023-11-10www: add topics_(new|active).(html|atom) endpoints
This seems like a easy (but WWW-specific) way to get recently created and recently active topics as suggested by Konstantin. To do this with Xapian will require a new columns and reindexing; and I'm not sure if the current lei handling of search results by dumping results to a format readable by common MUAs would work well with this. A new TUI may be required... Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20231107-skilled-cobra-of-swiftness-a6ff26@meerkat/
2023-11-03io: introduce write_file helper sub
This is pretty convenient way to create files for diff generation in both WWW and lei. The test suite should also be able to take advantage of it.
2023-11-03replace ProcessIO with untied PublicInbox::IO
This fixes two major problems with the use of tie for filehandles: * no way to do fcntl, stat, etc. calls directly on the tied handle, forcing callers to use the `tied' perlop to access the underlying IO::Handle * needing separate classes to handle blocking and non-blocking I/O As a result, Git->cleanup_if_unlinked, InputPipe->consume, and Qspawn->_yield_start have fewer bizzare bits and we can call `$io->blocking(0)' directly instead of `(tied *$io)->{fh}->blocking(0)' Having a PublicInbox::IO class will also allow us to support custom read buffering which allows inspecting the current state.
2023-10-25drop psgi_return, httpd/async and GetlineBody
Now that psgi_yield is used everywhere, the more complex psgi_return and it's helper bits can be removed. We'll also fix some outdated comments now that everything on psgi_return has switched to psgi_yield. GetlineResponse replaces GetlineBody and does a better job of isolating generic PSGI-only code.
2023-10-25qspawn: introduce new psgi_yield API
This is intended to replace psgi_return and HTTPD/Async entirely, hopefully making our code less convoluted while maintaining the ability to handle slow clients on memory-constrained systems This was made possible by the philosophy shift in commit 21a539a2df0c (httpd/async: switch to buffering-as-fast-as-possible, 2019-06-28). We'll still support generic PSGI via the `pull' model with a GetlineResponse class which is similar to the old GetlineBody.
2023-10-25psgi_qx: use a temporary file rather than pipe
A pipe requires more context switches, syscalls, and code to deal with unpredictable pipe EOF vs waitpid ordering. So just use the new spawn/aspawn features to automatically handle slurping output into a string.
2023-10-25limiter: split out from qspawn
It's slightly better organized this way, especially since `publicinboxLimiter' has its own user-facing config section and knobs. I may use it in LeiMirror and CodeSearchIdx for process management.
2023-10-08introduce ProcessIONBF for multiplexed non-blocking IO
This is required for reliable epoll/kevent/poll/select wakeup notifications, since we have no visibility into the buffer states used internally by Perl. We can safely use sysread here since we never use the :utf8 nor any :encoding Perl IO layers for readable pipes. I suspect this fixes occasional failures from t/solver_git.t when retrieving the WwwCoderepo summary.
2023-10-08rename ProcessPipe to ProcessIO
Since we deal with pipes (of either direction) and bidirectional stream sockets for this class, it's better to remove the `Pipe' from the name and replace it with `IO' to communicate that it works for any form of IO::Handle-like object tied to a process.
2023-10-08lei: always use async `done' requests to store
It's safer against deadlocks and we still get proper error reporting by passing stderr across in addition to the lei socket.
2023-09-22pop3: support initial_limit parameter in mailbox name
`initial_limit' only affects the fetch for new users while `limit' affects all users. `initial_limit' is intended to be better than the existing, absolute `limit' for 24/7 servers (e.g. scheduled POP3 imports via webmail). The existing `limit' parameter remains and is best suited for systems with sporadic Internet access. This also fixes an offset calculation bug with limit when used on the newest (non-full) slice. The number of messages in the newest slice is now taken into account. Link: https://public-inbox.org/meta/20230918-barrel-unhearing-b63869@meerkat/ Fixes: 392d251f97d4 (pop3: support `?limit=$NUM' parameter in mailbox name)
2023-09-20rename t/run.perl to xt/check-run
This allows us to get rid of some duplication in our Makefile
2023-09-16Makefile: add `check-debris' target
This non-parallelized target is useful for noticing core dumps and preventing them from being clobbered as we run the test suite. It will also notice leftover temporary files and directories. This make target was used on OpenBSD 7.3 to develop at least two recent fixes: e281363ba937 (lei: ensure we run DESTROY|END at daemon exit w/ kqueue) 759885e60e59 (lei: ensure --stdin sets %ENV and $current_lei) I considered using a per-test TMPDIR for this to enable parallelization, but on-filesystem UNIX sockets won't work with excessively long path names.
2023-09-14move deps.perl into new install/ directory
deps.perl can be useful for non-CI purposes as long as it's not blindly removing packages. Thus, a --allow-remove flag now exists for CI use and removals are disabled by default. deps.perl also gets easier-to-use in that now install/os.perl is split off from from ci/profiles.perl so OS-supplied packaged manager.
2023-09-12provide select(2) backend for PublicInbox::DS
This is safer than relying on an internal API of IO::Poll and doesn't create extra references to IO globs like the public one.
2023-09-11ds: use object-oriented API for epoll
This allows us to cut down on imports and reduce code. This also makes it easier (in the next commit) to provide an option to disable epoll/kqueue when saving an FD is valued over scalability.
2023-09-09ci/profiles: rewrite in Perl
Reading os-release(5) is a bit more painful, now; and still requires using the shell. However, sharing code between *BSDs and being able to use v-strings for version comparisons is much easier. Test profiles for *BSDs are also trimmed down and more focused on portability stuff.
2023-09-05update devel/syscall-list to devel/sysdefs-list
We use it to dump SIGWINCH and _SC_NPROCESSORS_ONLN, so "sysdefs" is a more appropriate list for *BSD users.
2023-08-24drop unused CidxRecvIbx.pm
This is no longer needed since xap_helper performs its functionality while having an optional C++ implementation which is being significantly faster.
2023-08-24cindex: implement dump_roots in C++
It's now just `dump_roots' instead of `dump_shard_roots', since this doesn't need to be tied to the concept of shards. I'm still shaky with C++, but intend to keep using stuff like hsearch(3) to make life easier for C hackers :P
2023-08-24introduce optional C++ xap_helper
This allows us to perform the expensive "dump_ibx" operations in native C++ code using the Xapian C++ library. This provides the majority of the speedup with the -cindex --associate switch. Eventually this may be expanded to cover all uses of Xapian within the project to ensure we have access to Xapian APIs which aren't available in XS|SWIG bindings; and also for ease-of-installation on systems which don't provide pre-packaged Perl Xapian bindings (e.g. OpenBSD 7.3) but do provide Xapian development libraries. Most of the C++ code is still C, as I'm not remotely familiar with C++ compared to C. I suspect many users and potential hackers being from git, Linux kernel, and glibc world are in the same boat.
2023-08-24cindex: read-only association dump
This will eventually allow associating coderepos with inboxes and vice-versa; avoiding the need for manual configuration via tedious publicinbox.*.coderepo directives. I'm not sure how this should be stored for WWW, yet, but it's required since it takes about 8 hours to do this fully across lore and git.kernel.org.
2023-08-16doc: add manpage for -cindex
It's similar to a combination of -index and -extindex but perhaps more refined this time around...
2023-07-13t/imapd: workaround a Perl 5.36.0 readline regression
Buffered readline (and read) ops under Perl 5.36.0 fails to read new data after writes are made by other file handles (or processes). To fix and improve our test, introduce a new, (currently) test-only TailNotify class to use inotify or kevent if available to workaround it while avoiding infinite polling loops. Further refinements to these test APIs since we use the same pattern for testing daemons in many places. This also fixes the TEST_KILL_IMAPD condition in t/imapd.t under GNU/Linux, AFAIK that test was never reliable under FreeBSD. Link: https://bugs.debian.org/1040947
2023-06-09add compat package for List::Util::uniqstr
This will make it easier to switch in the far future while making callers easier-to-read (and more callers will be added). Anyways, Perl 5.26 is a long time away for enterprise users; but isolating compatibility code away can improve readability of code we actually care about in the meantime.
2023-04-22cindex: rewrite prune (again) for speed
With my partial git.kernel.org mirror, this brings a full prune down from ~75 minutes to under 5 minutes using git 2.19+. This speedup even applies to users on slow storage (rotational HDD). First off, xapian-delve(1) is nearly 10x faster for dumping boolean terms by prefix than the equivalent Perl code with Xapian bindings. This performance difference is critical since we need to check over 5 million commits for pruning a partial git.kernel.org mirror. We can use sed(1) and sort(1) to massage delve output into something suitable for the first comm(1) input. For the second comm(1) input, the output of `git cat-file --batch-check --batch-all-objects' against all indexed git repos with awk(1) filtering provides the necessary output for generating a list of indexed-but-no-longer accessible commits. sed(1) and awk(1) are POSIX standard tools which can be roughly 2x faster than equivalent Perl for simple filters, while sort(1) is designed to handle larger-than-memory datasets efficiently (unlike the `sort' perlop). With slow storage and git <2.19, the switch to --batch-all-objects actually results in a performance regression since having git perform sorting results in worse disk locality than the previous sequential iteration by Xapian docid. git 2.19+ users with `--unordered' support benefits from improved storage locality; and speedups from storage locality dwarfs the extra overhead of an extra external sort(1) invocation. Even with consumer-grade SATA-II SSDs, the combo of --unordered and sort(1) provides a noticeable speedup since SSD latency remains a factor for --batch-all-objects. git <2.19 users must upgrade git to get acceptable performance on slow storage and giant indexes, but git 2.19 was released nearly 5 years ago so it's probably a reasonable requirement for performance. The only remaining downside of this change for all users the extra temporary disk space for sort(1) and comm(1); but the speedup provided with git 2.19+ is well worth it.
2023-04-07umask: hoist out of InboxWritable
Since CodeSearchIdx doesn't deal with inboxes, it makes sense to split it out from inbox-specific code and start moving towards using OnDestroy to restore the umask at the end of scope and reducing extra functions.
2023-04-05cindex: do prune work while waiting for `git log -p'
`git log -p' can several seconds to generate its initial output. SMP systems can be processing prunes during this delay, so let DS do a one-shot notification for us while prune is running. On Linux, we'll also use the biggest pipe possible so git can do more CPU-intensive work to generate diffs while our Perl processes are indexing and likely hitting I/O wait.
2023-03-29inotify: wrap with informative error message
As encountered by Louis DeLosSantos, Linux inotify is capped by a lesser-known limit than the standard RLIMIT_NOFILE (`ulimit -n`) value. Give the user a hint about the fs.inotify.max_user_instances sysctl knob on EMFILE, since EMFILE alone may mislead users into thinking they've hit the (typically higher) RLIMIT_NOFILE limit. I can test this on my system using: perl -I lib -MPublicInbox::Inotify -E \ 'my @x = map { PublicInbox::Inotify->new } (1..128)' But I hesitate to include it in the test suite since triggering the limit can cause unrelated processes to fail. Link: https://public-inbox.org/meta/CAE6jdTo8iQfNM9Yuk0Dwi-ARMxmQxX-onL8buXcQ9Ze3r0hKrg@mail.gmail.com/ Reported-by: Louis DeLosSantos <louis.delos@gmail.com>
2023-03-25codesearch: initial cut w/ -cindex tool
It seems relying on root commits is a reasonable way to deduplicate and handle repositories with common history. I initially wanted to shoehorn this into extindex, but decided a separate Xapian index layout capable of being EITHER external to handle many forks or internal (in $GIT_DIR/public-inbox-cindex) for small projects is the right way to go. Unlike most existing parts of public-inbox, this relies on absolute paths of $GIT_DIR stored in the Xapian DB and does not rely on the config file. We'll be relying on the config file to map absolute paths to public URL paths for WWW.
2023-03-10doc: technical: document weird stuff in our codebase
Hopefully this makes things less surprising to new hackers.
2023-03-02doc: drop hosted.txt
I'll have to downsize the server due to increased hosting costs, so stop advertising these mirrors. The inboxes still exist, for now; but will probably be proxied behind an ssh tunnel via slow DSL connection, but it's not worth increasing traffic to.
2023-01-30tests: make slow tests easier-to-find
t/run.perl now prints slowest 10 tests at startup, and I've added ./devel/longest-tests to print all tests sorted by elapsed time. This should allow us to notice outliers more quickly in the future.
2023-01-30use Net::SSLeay (OpenSSL) for SHA-(1|256) if installed
On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as the Digest::SHA implementation from Perl, most likely due to an optimized assembly implementation. SHA-1 is a few percent faster, too.
2023-01-13www_coderepo: /tree/ redirects to /$OID/s/
This is for compatibility with cgit to ease migration.
2023-01-11hoist MailDiff and ContentDigestDbg out of lei
These will be reused in the web UI, too.
2023-01-04www_coderepo: implement /$CODE_REPO/atom/ endpoint
This should be similar or identical to what's in cgit; and tie into the rest of the www_coderepo stuff.
2022-12-30clone: support --post-update-hook= from grokmirror
This should be compatible with both grokmirror 1 and 2 behavior and serialized on a per-repo basis.
2022-12-22tests: add tests for cloning coderepos w/ manifest
It's not much, yet, but it's something for the corner cases which I'm maybe not hitting under normal use.
2022-12-19relnotes: 2.0.0 work-in-progress
I'm thinking the -nntpd regression fix will push this release out sooner rather than later...
2022-10-05www_coderepo: wire up snapshot support
These should be compatible with cgit results
2022-10-05www_coderepo: an alternative to cgit
This will allow it to easily map a single coderepo to multiple inboxes (or multiple coderepos to any number of inboxes). For now, this is just a summary, but $REPO/$OID/s/ support will be added, along with archive downloads. Indexing of coderepos will probably be supported via -extindex, only.