about summary refs log tree commit homepage
path: root/t
DateCommit message (Collapse)
2024-04-17lei: use ->barrier to commit to lei/store
barrier (synchronous checkpoint) is better than ->done with parallel lei commands being issued (via '&' or different terminals), since repeatedly stopping and restarting processes doesn't play nicely with expensive tasks like `lei reindex'. This introduces a slight regression in maintaining more processes (and thus resource use) when lei is idle, but that'll be fixed in the next commit.
2024-04-12lei q: support --thread-id=$MSGID || -T $MSGID
This adds support for the "POST /$INBOX/$MSGID/?x=m?q=..." added last year to support per-thread searches 764035c83 (www: support POST /$INBOX/$MSGID/?x=m&q=, 2023-03-30) This only supports instances of public-inbox since 764035c83, but unfortunately there hasn't been a release since then.
2024-04-03treewide: avoid getpid for more ownership checks
There are still some places where on_destroy isn't suitable, This gets rid of getpid() calls in most of those cases to reduce syscall costs and cleanup syscall trace output.
2024-04-03treewide: avoid getpid() for OnDestroy checks
getpid() isn't cached by glibc nowadays and system calls are more expensive due to CPU vulnerability mitigations. To ensure we switch to the new semantics properly, introduce a new `on_destroy' function to simplify callers. Furthermore, most OnDestroy correctness is often tied to the process which creates it, so make the new API default to guarded against running in subprocesses. For cases which require running in all children, a new PublicInbox::OnDestroy::all call is provided.
2024-03-10import: fix handling of init.defaultBranch
We must chomp the newline in the branch name if it's set. Reported-by: Rob Herring <robh@kernel.org> Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/ Fixes: 73830410e4336b77 (treewide: use run_qx where appropriate, 2023-10-27)
2024-02-06pop3d: support fcntl locks on OpenBSD i386
The packaged Perl on OpenBSD i386 supports 64-bit file offsets but not 64-bit integer support for 'q' and 'Q' with `pack'. Since servers aren't likely to require lock files larger than 2 GB (we'd need an inbox with >2 billion messages), we can workaround the Perl build limitation with explicit padding. File::FcntlLock isn't packaged for OpenBSD <= 7.4 (but should be in future releases), but I can test i386 OpenBSD on an extremely slow VM. Big endian support can be done, too, but I have no idea if there's 32-bit BE users around nowadays...
2024-02-01lei: sort MH inputs sequentially by default
MH sequence numbers can be analogous to IMAP UIDs and NNTP article numbers (or more like IMAP MSNs with clients which pack). In any case, sort then numerically by default to avoid surprising users who treat NNTP spools and mlmmj archives as MH folders. This gives more coherent git history and resulting NNTP/IMAP numbering when round-tripping MH -> v2 -> (NNTP|IMAP) -> MH
2024-02-01lei convert: explicitly allow --sort for inputs
LeiToMail can't sort v2 output, but sorting MH input (and NNTP spool + mlmmj archives) numerically makes sense.
2024-01-30spawn: support some rlimit uses via Inline::C
BSD::Resource isn't packaged for Alpine (as of 3.19), but we also have optional Inline::C support and already rely on calling setrlimit(2) directly from the Inline::C version of pi_fork_exec.
2024-01-30watch: support incremental updates from MH
The good news (compared to lei) is we only have to worry about imports and don't care about the filename nor keywords, so it's immune to .mh_sequences writing inconsistencies across MH implementations and sequence number packing. We still assume the writer will write the mail file with one of: * rename(2) to create the final sequence number filename * a single write(2) if not relying on rename(2) mlmmj and mutt satisfy these requirements. Python's Lib/mailbox.py may, I'm not sure...
2024-01-30syscall: use pure Perl sendmsg/recvmsg on *BSD
While syscall symbols (e.g. SYS_*) have changed on us in FreeBSD during the history of Sys::Syscall and this project and did bite us in some cases; the actual numbers don't get recycled for new syscalls. We're also fortunate that sendmsg and recvmsg syscalls and associated msghdr and cmsg structs predate the BSD forks and are compatible across all the BSDs I've tried. OpenBSD routes Perl `syscall' through libc; while NetBSD + FreeBSD document procedures for maintaining backwards compatibility. It looks like Dragonfly follows FreeBSD, here. Tested on i386 OpenBSD, and amd64 {Free,Net,Open,Dragonfly}BSD This enables *BSD users to use lei, -cindex and future SCM_RIGHTS-only features without needing Inline::C. [1] https://cvsweb.openbsd.org/src/gnu/usr.bin/perl/gen_syscall_emulator.pl [2] https://www.netbsd.org/docs/internals/en/chap-processes.html#syscall_versioning [3] https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
2024-01-17config: glob2re: fix over-matching /**/foo
Noticed while adding wildcard support to WwwCoderepo...
2024-01-17tests: clarify Email::MIME is only for development
We moved to PublicInbox::Eml a while back and have no plans to go back to using Email::MIME, so don't tempt users and packagers to waste disk space on Email::MIME.
2024-01-10address: avoid [ undef, undef ] address pairs
For totally bogus things in address fields, we'll fall back to showing the original entry in the name column when using Email::Address::XS. The pure Perl version differs here, but we'll just let them be different when it comes to handling bogus data.
2024-01-04lei: MH: support inotify to detect updates
This should help us deal with MH sequence number packing and invalidating mail_sync.sqlite3.
2023-12-30lei: support reading MH for convert+import+index
The MH format is widely-supported and used by various MUAs such as mutt and sylpheed, and a MH-like format is used by mlmmj for archives, as well. Locking implementations for writes are inconsistent, so this commit doesn't support writes, yet. inotify|EVFILT_VNODE watches aren't supported, yet, but that'll have to come since MH allows packing unused integers and renaming files.
2023-12-29pure Perl inotify support
This is a step towards improving the out-of-the-box experience in achieving notifications without XS, extra downloads, and .so loading + runtime mmap overhead. This also fixes loongarch support of all Linux syscalls due to a bad regexp :x All the reachable Linux architectures listed at <https://portal.cfarm.net/machines/list/> should be supported. At the moment, there appears to be no reachable sparc* Linux machines available to cfarm users. Fixes: b0e5093aa3572a86 (syscall: add support for riscv64, 2022-08-11)
2023-12-16lei index: support +L: labels
`lei index' should be capable of indexing the the same way `lei import' does, but without the indexing. I only noticed this omission while developing a new feature.
2023-12-16t/pop3d-limit: use v1 inbox to test on ancient git
We don't need v2 features nor scalability to test POP3 stuff.
2023-12-16cindex: --prune needs git 2.6+
Older versions of git lack --batch-all-objects, and 2.6+ is new enough already since v2, lei, etc all depend on it.
2023-12-16tests: quiet uninitialized warnings on CentOS 7.x
Test::More distributed with Perl 5.16.3 on CentOS 7.x expects the `$how_many' argument for `skip' and warns when its uninitialized, so quiet that warning down.
2023-12-13t/lei-import: relax EIO regexp
musl uses "I/O error" while glibc uses "Input/output error" I wish something like strerrorname_np(3) were portable and built into Perl so we could just match on /EIO/.
2023-12-13www_coderepo: fix read buffering
Our read buffering only worked well with the stdout buffering on glibc and *BSD libc, but not musl. When reading the stdout of git(1), we are likely to get smaller buffers and require more reads on musl-based systems (tested Alpine Linux 3.19.0). Thus we must prevent ->translate from being called with an empty argument list (denoting EOF). We'll also avoid some local variable assignments while at it and favor the non-OO ->zflush dispatch inside RepoAtom and WwwCoderepo subclasses.
2023-12-13t/convert-compact: allow S_ISGID bit
My user home directory on Alpine has S_ISGID set on it and every subdirectory inherits it. This includes my work tree and the t/data-gen/* subdirectories. So just ignore the presence (or non-presence) of the S_ISGID bit on directories descended from the cached t/data-gen/* directories. Now, public-inbox-convert may want to preserve S_ISGID on the newly-created v2 inbox, but that's a separate discussion.
2023-12-13test_common: extract oct_is from search.t
And use it in convert-compact.t This gives us nicer errors for debugging a problem I noticed on Alpine Linux (tested 3.19.0)
2023-12-13xap_helper_cxx: decouple from Inline::C
We don't actually need Inline::C support to build a standalone executable implemented in C++.
2023-12-13tests: attempt compatibility w/ busybox lsof
BusyBox lsof(1) ignores the `-p PID' argument and shows the open files for every process it knows about. BusyBox lsof also lacks the `NODE' column of the non-BusyBox implementation, so we'll rely on /proc/PID/fd/ in those cases since the deleted file checks are Linux-only and it's common to have procfs is mounted on /proc on Linux.
2023-12-13t/cindex*: skip --join when join(1) is missing
While join(1) is POSIX, busybox on Alpine 3.19.0 does not provide its functionality. So just skip tests for now since it's too much trouble to provide a workaround for an otherwise common POSIX command.
2023-12-13tests: account for missing git-http-backend
Alpine Linux ships git-http-backend in the `git-daemon' package separately from `git', so we must test for its existence before attempting to test functionality which depends on it.
2023-12-10imap: replace Mail::Address fallback with AddressPP
Our pure-Perl (PublicInbox::AddressPP) fallback is closer to the preferred Email::Address::XS (EAX) behavior than Mail::Address is for ->name support. EAX tends to be overkill with good spam filtering, and using our own fallback means life is easier for users with neither C/XS build tools nor a pre-built EAX package.
2023-12-09xap_helper: support term length limit
This will allow us to use p2q-compatible specifications such as "dfpost7" to only capture blob OIDs which are 7 characters in length (the indexer will always index down to 7 characters)
2023-12-06t/cindex: fix test when worktree PWD is a symlink
Our code aims to respect $ENV{PWD} (and therefore symlinks) as much as possible to ensure portability across devices when repos and indices are on portable or shared storage. Thus we can't rely on Cwd::abs_path and ought to favor File::Spec->rel2abs whenever absolute paths are required. I noticed this when working on a VM where my worktree is a symlink to a more reliable device.
2023-12-05cindex: index full (40/64 char) hex blob OIDs
This future proofs the index against git auto-abbreviation needing more characters as the repo grows. It'll be useful for joining against inboxes using dfpre. As with emails, we'll continue indexing abbreviated blob OIDs down to 7 hex characters so a SHA-1 git repo will have all abbreviations of the OID from 7-39 hex characters in addition to the 40 character unabbreviated form.
2023-12-01tests: note kevent+tmpfs failures on DragonFly <= 6.4
I forgot to set TMPDIR=/path/to/non-tmpfs again.
2023-12-01t/xap_helper: make sendmsg errors more obvious
By ignoring SIGPIPE, we hit our own error path and emit an informative error message instead of dying abruptly and requiring somebody to run `echo $?' to see the child status from their shell.
2023-11-29www: mail_diff: add final newline before diffing
This gets rid of the "\ No newline at end of file" since it's distracting noise.
2023-11-29lei q: fix --no-import-before completion + docs
--no-import-before skips importing entire messages, not just keywords, so it can cause permanent data loss if -o is pointed to precious data.
2023-11-29admin: resolve_git_dir respects symlinks
Absolute pathnames of git coderepos are stored in the cindex, but we should favor paths relative to $ENV{PWD} since it respects symlinks in the heirarchy. Respecting symlinks makes it easier to migrate cindex to new storage as old storage wears out and to relocate the storage device onto another machine.
2023-11-29cindex: require `-g GIT_DIR' or `-r PROJECT_ROOT'
Accepting @ARGV without switches ends up being ambiguous with optional parameters for --join and --show. Requiring users to specify `--join=' or `--show=' is a bit awkward (as it with -clone --objstore= and the like, but that is historical baggage we need to carry at this point...)
2023-11-29www: load and use cindex join data
This is a major step in solving the problem of having to manually associate hundreds/thousands of coderepos with hundreds/thousands of public-inboxes to power solver (and more).
2023-11-29xap_helper: implement mset endpoint for WWW, IMAP, etc...
The C++ version will allow us to take full advantage of Xapian's APIs for better queries, and the Perl bindings version can still be advantageous in the future since we'll be able to support timeouts effectively.
2023-11-29t/cindex*: require SCM_RIGHTS for these tests
Code search will require SCM_RIGHTS, and Inline::C on BSDs probably isn't too onerous a dependency for new features as all the ones I've tested have it packaged. Furthermore, requiring SCM_RIGHTS isn't far off since OpenBSD's Perl is patched to route the `syscall' perlop through libc[1], while NetBSD[2] and FreeBSD[3] actually do strive for backwards compatibility. We'd just need to use the numbers and not rely on syscall.ph shipped with Perl since the macro names themselves are unstable. [1] https://cvsweb.openbsd.org/src/gnu/usr.bin/perl/gen_syscall_emulator.pl [2] https://www.netbsd.org/docs/internals/en/chap-processes.html#syscall_versioning [3] https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
2023-11-27www: qs_html: fix escaping of `q' param
Our use of MID_ESC characters was only intended for the pathname component of URIs and not appropriate for the query string component. So use a different $unsafe parameter list for uri_escape to make the result appropriate for query strings by disallowing [\&\'\+=] characters. Most notably, this change also allows us to accept `/' (slash) unescaped to make dfn: queries nicer to look at. Finally, we'll also add a ascii_html call on the URI-escaped result as an extra safety measure even though it's not really needed. As far as I can tell, the code without this fix didn't result in in an HTML injection since all our uses of uri_escape did escape angle brackets. Reported-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Link: https://public-inbox.org/meta/87o7ff4nlk.fsf@collabora.com/ Tested-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
2023-11-27t/nntpd-tls: avoid test failure on OpenBSD 7.3
The LibreSSL 3.7.2 on my OpenBSD 7.3 VM seems return 7 bytes of junk data before EOF/ECONNRESET when a client attempts to write plain-text to a TLS socket. Tested-by: Štěpán Němec <stepnem@smrk.net>
2023-11-26xap_helper: allow PI_NO_CXX to disable C++ in more places
This also reduces repetition in the setup code.
2023-11-26xap_client: attach PID to the IO object
As with our popen_* uses, we can simplify callers by using attach_pid to handle automatic reaping upon close.
2023-11-25t/cindex-join: fix warnings from a missing comma
Yes, that was valid Perl syntax :x
2023-11-22watch: support `watch=false' to negate watchspam
For users hosting read-only mirrors (via clone|fetch) and feeding inboxes via -watch
2023-11-22lei_to_mail: don't close STDOUT unless it is a mbox* output
We only care about error checking when stdout is an mbox output pointed to a pathname. This is noticeable with `lei up' with multiple non-mbox* destinations. We'll also ensure throwing exceptions to trigger lei->x_it from lei->do_env results in the epoll/kqueue watch being discarded, otherwise commands may never terminate (leading to stuck tests)
2023-11-21cindex: rename --associate to --join, test w/ real repos
The association data is just stored as deflated JSON in Xapian metadata keys of shard[0] for now. It should be reasonably compact and fit in memory for now since we'll assume sane, non-malicious git coderepo history, for now. The new cindex-join.t test requires TEST_REMOTE_JOIN=1 to be set in the environment and tests the joins against the inboxes and coderepos of two small projects with a common history. Internally, we'll use `ibx_off', `root_off' instead of `ibx_id' and `root_id' since `_id' may be mistaken for columns in an SQL database which they are not.