Date | Commit message (Collapse) |
|
barrier (synchronous checkpoint) is better than ->done with
parallel lei commands being issued (via '&' or different
terminals), since repeatedly stopping and restarting processes
doesn't play nicely with expensive tasks like `lei reindex'.
This introduces a slight regression in maintaining more
processes (and thus resource use) when lei is idle, but that'll
be fixed in the next commit.
|
|
This adds support for the "POST /$INBOX/$MSGID/?x=m?q=..."
added last year to support per-thread searches
764035c83 (www: support POST /$INBOX/$MSGID/?x=m&q=, 2023-03-30)
This only supports instances of public-inbox since 764035c83,
but unfortunately there hasn't been a release since then.
|
|
There are still some places where on_destroy isn't suitable,
This gets rid of getpid() calls in most of those cases to
reduce syscall costs and cleanup syscall trace output.
|
|
getpid() isn't cached by glibc nowadays and system calls are
more expensive due to CPU vulnerability mitigations. To
ensure we switch to the new semantics properly, introduce
a new `on_destroy' function to simplify callers.
Furthermore, most OnDestroy correctness is often tied to the
process which creates it, so make the new API default to
guarded against running in subprocesses.
For cases which require running in all children, a new
PublicInbox::OnDestroy::all call is provided.
|
|
We must chomp the newline in the branch name if it's set.
Reported-by: Rob Herring <robh@kernel.org>
Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
Fixes: 73830410e4336b77 (treewide: use run_qx where appropriate, 2023-10-27)
|
|
The packaged Perl on OpenBSD i386 supports 64-bit file offsets
but not 64-bit integer support for 'q' and 'Q' with `pack'.
Since servers aren't likely to require lock files larger than
2 GB (we'd need an inbox with >2 billion messages), we can
workaround the Perl build limitation with explicit padding.
File::FcntlLock isn't packaged for OpenBSD <= 7.4 (but should be
in future releases), but I can test i386 OpenBSD on an extremely
slow VM.
Big endian support can be done, too, but I have no idea if
there's 32-bit BE users around nowadays...
|
|
MH sequence numbers can be analogous to IMAP UIDs and NNTP
article numbers (or more like IMAP MSNs with clients which
pack). In any case, sort then numerically by default to avoid
surprising users who treat NNTP spools and mlmmj archives as MH
folders. This gives more coherent git history and resulting
NNTP/IMAP numbering when round-tripping MH -> v2 -> (NNTP|IMAP) -> MH
|
|
LeiToMail can't sort v2 output, but sorting MH input (and
NNTP spool + mlmmj archives) numerically makes sense.
|
|
BSD::Resource isn't packaged for Alpine (as of 3.19), but we
also have optional Inline::C support and already rely on calling
setrlimit(2) directly from the Inline::C version of pi_fork_exec.
|
|
The good news (compared to lei) is we only have to worry about
imports and don't care about the filename nor keywords, so it's
immune to .mh_sequences writing inconsistencies across MH
implementations and sequence number packing.
We still assume the writer will write the mail file with one of:
* rename(2) to create the final sequence number filename
* a single write(2) if not relying on rename(2)
mlmmj and mutt satisfy these requirements. Python's Lib/mailbox.py
may, I'm not sure...
|
|
While syscall symbols (e.g. SYS_*) have changed on us in FreeBSD
during the history of Sys::Syscall and this project and did bite
us in some cases; the actual numbers don't get recycled for new
syscalls. We're also fortunate that sendmsg and recvmsg syscalls
and associated msghdr and cmsg structs predate the BSD forks and
are compatible across all the BSDs I've tried.
OpenBSD routes Perl `syscall' through libc; while NetBSD + FreeBSD
document procedures for maintaining backwards compatibility.
It looks like Dragonfly follows FreeBSD, here.
Tested on i386 OpenBSD, and amd64 {Free,Net,Open,Dragonfly}BSD
This enables *BSD users to use lei, -cindex and future SCM_RIGHTS-only
features without needing Inline::C.
[1] https://cvsweb.openbsd.org/src/gnu/usr.bin/perl/gen_syscall_emulator.pl
[2] https://www.netbsd.org/docs/internals/en/chap-processes.html#syscall_versioning
[3] https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
|
|
Noticed while adding wildcard support to WwwCoderepo...
|
|
We moved to PublicInbox::Eml a while back and have no plans
to go back to using Email::MIME, so don't tempt users and
packagers to waste disk space on Email::MIME.
|
|
For totally bogus things in address fields, we'll fall back to
showing the original entry in the name column when using
Email::Address::XS.
The pure Perl version differs here, but we'll just let them be
different when it comes to handling bogus data.
|
|
This should help us deal with MH sequence number packing and
invalidating mail_sync.sqlite3.
|
|
The MH format is widely-supported and used by various MUAs such
as mutt and sylpheed, and a MH-like format is used by mlmmj for
archives, as well. Locking implementations for writes are
inconsistent, so this commit doesn't support writes, yet.
inotify|EVFILT_VNODE watches aren't supported, yet, but that'll
have to come since MH allows packing unused integers and
renaming files.
|
|
This is a step towards improving the out-of-the-box experience
in achieving notifications without XS, extra downloads, and .so
loading + runtime mmap overhead.
This also fixes loongarch support of all Linux syscalls due to
a bad regexp :x
All the reachable Linux architectures listed at
<https://portal.cfarm.net/machines/list/> should be supported.
At the moment, there appears to be no reachable sparc* Linux
machines available to cfarm users.
Fixes: b0e5093aa3572a86 (syscall: add support for riscv64, 2022-08-11)
|
|
`lei index' should be capable of indexing the the same way
`lei import' does, but without the indexing. I only noticed
this omission while developing a new feature.
|
|
We don't need v2 features nor scalability to test POP3 stuff.
|
|
Older versions of git lack --batch-all-objects, and 2.6+ is
new enough already since v2, lei, etc all depend on it.
|
|
Test::More distributed with Perl 5.16.3 on CentOS 7.x expects
the `$how_many' argument for `skip' and warns when its
uninitialized, so quiet that warning down.
|
|
musl uses "I/O error" while glibc uses "Input/output error"
I wish something like strerrorname_np(3) were portable
and built into Perl so we could just match on /EIO/.
|
|
Our read buffering only worked well with the stdout buffering on
glibc and *BSD libc, but not musl. When reading the stdout of
git(1), we are likely to get smaller buffers and require more
reads on musl-based systems (tested Alpine Linux 3.19.0).
Thus we must prevent ->translate from being called with an empty
argument list (denoting EOF). We'll also avoid some local
variable assignments while at it and favor the non-OO ->zflush
dispatch inside RepoAtom and WwwCoderepo subclasses.
|
|
My user home directory on Alpine has S_ISGID set on it and every
subdirectory inherits it. This includes my work tree and the
t/data-gen/* subdirectories. So just ignore the presence (or
non-presence) of the S_ISGID bit on directories descended from
the cached t/data-gen/* directories.
Now, public-inbox-convert may want to preserve S_ISGID on the
newly-created v2 inbox, but that's a separate discussion.
|
|
And use it in convert-compact.t This gives us nicer errors for
debugging a problem I noticed on Alpine Linux (tested 3.19.0)
|
|
We don't actually need Inline::C support to build a standalone
executable implemented in C++.
|
|
BusyBox lsof(1) ignores the `-p PID' argument and shows
the open files for every process it knows about. BusyBox
lsof also lacks the `NODE' column of the non-BusyBox
implementation, so we'll rely on /proc/PID/fd/ in those
cases since the deleted file checks are Linux-only and
it's common to have procfs is mounted on /proc on Linux.
|
|
While join(1) is POSIX, busybox on Alpine 3.19.0 does not
provide its functionality. So just skip tests for now since
it's too much trouble to provide a workaround for an otherwise
common POSIX command.
|
|
Alpine Linux ships git-http-backend in the `git-daemon'
package separately from `git', so we must test for its
existence before attempting to test functionality which
depends on it.
|
|
Our pure-Perl (PublicInbox::AddressPP) fallback is closer to the
preferred Email::Address::XS (EAX) behavior than Mail::Address
is for ->name support. EAX tends to be overkill with good spam
filtering, and using our own fallback means life is easier for
users with neither C/XS build tools nor a pre-built EAX package.
|
|
This will allow us to use p2q-compatible specifications such as
"dfpost7" to only capture blob OIDs which are 7 characters in
length (the indexer will always index down to 7 characters)
|
|
Our code aims to respect $ENV{PWD} (and therefore symlinks) as
much as possible to ensure portability across devices when repos
and indices are on portable or shared storage. Thus we can't
rely on Cwd::abs_path and ought to favor File::Spec->rel2abs
whenever absolute paths are required.
I noticed this when working on a VM where my worktree is a
symlink to a more reliable device.
|
|
This future proofs the index against git auto-abbreviation
needing more characters as the repo grows. It'll be useful for
joining against inboxes using dfpre.
As with emails, we'll continue indexing abbreviated blob OIDs
down to 7 hex characters so a SHA-1 git repo will have all
abbreviations of the OID from 7-39 hex characters in addition
to the 40 character unabbreviated form.
|
|
I forgot to set TMPDIR=/path/to/non-tmpfs again.
|
|
By ignoring SIGPIPE, we hit our own error path and emit an informative
error message instead of dying abruptly and requiring somebody to run
`echo $?' to see the child status from their shell.
|
|
This gets rid of the "\ No newline at end of file"
since it's distracting noise.
|
|
--no-import-before skips importing entire messages, not just
keywords, so it can cause permanent data loss if -o is pointed
to precious data.
|
|
Absolute pathnames of git coderepos are stored in the cindex,
but we should favor paths relative to $ENV{PWD} since it
respects symlinks in the heirarchy.
Respecting symlinks makes it easier to migrate cindex to
new storage as old storage wears out and to relocate the
storage device onto another machine.
|
|
Accepting @ARGV without switches ends up being ambiguous with
optional parameters for --join and --show. Requiring users to
specify `--join=' or `--show=' is a bit awkward (as it with
-clone --objstore= and the like, but that is historical baggage
we need to carry at this point...)
|
|
This is a major step in solving the problem of having to
manually associate hundreds/thousands of coderepos with
hundreds/thousands of public-inboxes to power solver
(and more).
|
|
The C++ version will allow us to take full advantage of Xapian's
APIs for better queries, and the Perl bindings version can still
be advantageous in the future since we'll be able to support
timeouts effectively.
|
|
Code search will require SCM_RIGHTS, and Inline::C on BSDs
probably isn't too onerous a dependency for new features as
all the ones I've tested have it packaged.
Furthermore, requiring SCM_RIGHTS isn't far off since OpenBSD's
Perl is patched to route the `syscall' perlop through libc[1],
while NetBSD[2] and FreeBSD[3] actually do strive for backwards
compatibility. We'd just need to use the numbers and not rely
on syscall.ph shipped with Perl since the macro names themselves
are unstable.
[1] https://cvsweb.openbsd.org/src/gnu/usr.bin/perl/gen_syscall_emulator.pl
[2] https://www.netbsd.org/docs/internals/en/chap-processes.html#syscall_versioning
[3] https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
|
|
Our use of MID_ESC characters was only intended for the pathname
component of URIs and not appropriate for the query string
component. So use a different $unsafe parameter list for
uri_escape to make the result appropriate for query strings by
disallowing [\&\'\+=] characters. Most notably, this change
also allows us to accept `/' (slash) unescaped to make dfn: queries
nicer to look at.
Finally, we'll also add a ascii_html call on the URI-escaped
result as an extra safety measure even though it's not really
needed.
As far as I can tell, the code without this fix didn't result in
in an HTML injection since all our uses of uri_escape did escape
angle brackets.
Reported-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
Link: https://public-inbox.org/meta/87o7ff4nlk.fsf@collabora.com/
Tested-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
|
|
The LibreSSL 3.7.2 on my OpenBSD 7.3 VM seems return 7 bytes of
junk data before EOF/ECONNRESET when a client attempts to write
plain-text to a TLS socket.
Tested-by: Štěpán Němec <stepnem@smrk.net>
|
|
This also reduces repetition in the setup code.
|
|
As with our popen_* uses, we can simplify callers by using
attach_pid to handle automatic reaping upon close.
|
|
Yes, that was valid Perl syntax :x
|
|
For users hosting read-only mirrors (via clone|fetch) and feeding
inboxes via -watch
|
|
We only care about error checking when stdout is an mbox output
pointed to a pathname. This is noticeable with `lei up' with
multiple non-mbox* destinations. We'll also ensure throwing
exceptions to trigger lei->x_it from lei->do_env results in the
epoll/kqueue watch being discarded, otherwise commands may never
terminate (leading to stuck tests)
|
|
The association data is just stored as deflated JSON in Xapian
metadata keys of shard[0] for now. It should be reasonably
compact and fit in memory for now since we'll assume sane,
non-malicious git coderepo history, for now.
The new cindex-join.t test requires TEST_REMOTE_JOIN=1 to be
set in the environment and tests the joins against the inboxes
and coderepos of two small projects with a common history.
Internally, we'll use `ibx_off', `root_off' instead of `ibx_id'
and `root_id' since `_id' may be mistaken for columns in an SQL
database which they are not.
|