about summary refs log tree commit homepage
path: root/lib/PublicInbox
DateCommit message (Collapse)
2024-01-10www: use autodie in more coderepo places
This cuts down on code somewhat (before I add more :x)
2024-01-10address: avoid [ undef, undef ] address pairs
For totally bogus things in address fields, we'll fall back to showing the original entry in the name column when using Email::Address::XS. The pure Perl version differs here, but we'll just let them be different when it comes to handling bogus data.
2024-01-10www: linkify inbox addresses in To/Cc headers
This makes it easier to discover contemporary messages crossposted to other groups within the same WWW instance. The internal cache is necessary for giant threads, and the expiry mechanism is necessary to prevent attackers from trivially OOM-ing.
2024-01-10git: lowercase host in host_prefix_url
This will make it more effective for use as a cache key. I'm not entirely happy with this sub being in the Git module since it's used by lei and command-line tools, but that's for another day to deal with...
2024-01-10test_common: key2sub: don't require final ';' in scripts
I noticed this when I wrote a new (but probably unnecessary) *.t test and `make check-run' failed since I omitted the final semi-colon after `done_testing'.
2024-01-10git: workaround occasional -watch error message
I'm not sure how this happens (perl 5.34.1 on FreeBSD 13.2) but it appears the {sock} check can succeed and then go undef and become unable to call ->owner_pid. This happens when libgit2 is in use, so perhaps that's a factor. In any case, the rest of the tests succeed.
2024-01-04lei: MH: support inotify to detect updates
This should help us deal with MH sequence number packing and invalidating mail_sync.sqlite3.
2024-01-02view: always show strict|loose note w/ multi-roots
For thread skeletons with multiple roots, it makes sense to note the strict|loose delineation even when the first message matches the desired Message-ID.
2024-01-02over: re-sort Subject matches for WWW /T/ endpoint
When retrieving loose (Subject) matches for a thread, we wanted the most recent matches in reverse chronological order. However, when displaying the /T/ endpoint generating the thread skeleton, we prefer ascending chronological order to match the flow of the conversation. Reported-by: Askar Safin <safinaskar@gmail.com> Link: https://public-inbox.org/meta/CAPnZJGAqsh8ZhPaCAy5M2NZVNcWrr_Hr94t32VXiyiTXwD9jRQ@mail.gmail.com/
2023-12-30lei: support reading MH for convert+import+index
The MH format is widely-supported and used by various MUAs such as mutt and sylpheed, and a MH-like format is used by mlmmj for archives, as well. Locking implementations for writes are inconsistent, so this commit doesn't support writes, yet. inotify|EVFILT_VNODE watches aren't supported, yet, but that'll have to come since MH allows packing unused integers and renaming files.
2023-12-29pure Perl inotify support
This is a step towards improving the out-of-the-box experience in achieving notifications without XS, extra downloads, and .so loading + runtime mmap overhead. This also fixes loongarch support of all Linux syscalls due to a bad regexp :x All the reachable Linux architectures listed at <https://portal.cfarm.net/machines/list/> should be supported. At the moment, there appears to be no reachable sparc* Linux machines available to cfarm users. Fixes: b0e5093aa3572a86 (syscall: add support for riscv64, 2022-08-11)
2023-12-16lei: use ->child_error API properly
I noticed this bug while developing another feature and tests were getting SIGHUP (since SIGHUP == 1 on most systems).
2023-12-16lei index: support +L: labels
`lei index' should be capable of indexing the the same way `lei import' does, but without the indexing. I only noticed this omission while developing a new feature.
2023-12-16git: quiet down `rev-parse --git-path' errors
This fixes t/mda.t with git 1.8.5
2023-12-16cindex: --prune needs git 2.6+
Older versions of git lack --batch-all-objects, and 2.6+ is new enough already since v2, lei, etc all depend on it.
2023-12-16searchidx: quiet down old git patchid
CentOS 7.x ships with git 1.8.5, so unless a CentOS 7.x user enables 3rd-party repos[1], they'll be stuck with a version of git without `--stable' (though I'm becoming skeptical of indexing patchids at all). [1] https://public-inbox.org/meta/20210421151308.yz5hzkgm75klunpe@nitro.local/
2023-12-16tests: quiet uninitialized warnings on CentOS 7.x
Test::More distributed with Perl 5.16.3 on CentOS 7.x expects the `$how_many' argument for `skip' and warns when its uninitialized, so quiet that warning down.
2023-12-13gzip_filter: use OO ->zflush dispatch
While it's not in a code path intended WwwCoderepo and RepoAtom, those classes provide their own ->zflush, this can future-proof our code against future subclasses at a minor performance cost.
2023-12-13www_coderepo: fix read buffering
Our read buffering only worked well with the stdout buffering on glibc and *BSD libc, but not musl. When reading the stdout of git(1), we are likely to get smaller buffers and require more reads on musl-based systems (tested Alpine Linux 3.19.0). Thus we must prevent ->translate from being called with an empty argument list (denoting EOF). We'll also avoid some local variable assignments while at it and favor the non-OO ->zflush dispatch inside RepoAtom and WwwCoderepo subclasses.
2023-12-13test_common: extract oct_is from search.t
And use it in convert-compact.t This gives us nicer errors for debugging a problem I noticed on Alpine Linux (tested 3.19.0)
2023-12-13xap_helper_cxx: support clang w/o `c++' executable
This makes the C++ build work on Alpine Linux (tested 3.19.0) without having to install g++ to get the `c++' executable. I've tested this change with and without g++ on Alpine so it'll continue to work if a user decides to install g++. This should continue to work if the Xapian package on Alpine is changed to link against libc++ instead of libstdc++, since we only add `-lstdc++' as a fallback. For reference, Xapian is already linked against libc++ and not libstdc++ on FreeBSD 13.x
2023-12-13xap_helper_cxx: decouple from Inline::C
We don't actually need Inline::C support to build a standalone executable implemented in C++.
2023-12-13treewide: avoid strftime %k for portability
The musl strftime(3) implementation on AlpineLinux 3.19.0 doesn't support `%k' and `%k' isn't in POSIX, either. So we fall back to using the `sprintf' perlop in the user-facing UI since leading zeroes require needless overhead for my eyes and brain to parse in the time.
2023-12-13lei inspect: drop unneeded strftime import
`lei inspect' uses the `iso8601' sub from LeiOverview.
2023-12-13tests: attempt compatibility w/ busybox lsof
BusyBox lsof(1) ignores the `-p PID' argument and shows the open files for every process it knows about. BusyBox lsof also lacks the `NODE' column of the non-BusyBox implementation, so we'll rely on /proc/PID/fd/ in those cases since the deleted file checks are Linux-only and it's common to have procfs is mounted on /proc on Linux.
2023-12-13tests: account for missing git-http-backend
Alpine Linux ships git-http-backend in the `git-daemon' package separately from `git', so we must test for its existence before attempting to test functionality which depends on it.
2023-12-13t/io: strace is optional on Linux
There are many Linux (GNU or otherwise) which do not have strace(1) installed.
2023-12-10imap: replace Mail::Address fallback with AddressPP
Our pure-Perl (PublicInbox::AddressPP) fallback is closer to the preferred Email::Address::XS (EAX) behavior than Mail::Address is for ->name support. EAX tends to be overkill with good spam filtering, and using our own fallback means life is easier for users with neither C/XS build tools nor a pre-built EAX package.
2023-12-09cindex: switch --join to use dfpost7 by default
Post-image blob OIDs are what solver already works with, and longer OIDs may not be available in historical mail archives. `patchid' turns out to be unsuitable since: 1) git's default diff algorithm has changed over time 2) users may use different diff options to improve readability Of course, we could eventually run `lei rediff' during the index phase to regenerate patchids, but that's out-of-scope for now and likely to be too expensive.
2023-12-09xap_helper: support term length limit
This will allow us to use p2q-compatible specifications such as "dfpost7" to only capture blob OIDs which are 7 characters in length (the indexer will always index down to 7 characters)
2023-12-09xap_helper_cxx: drop chdir usage in build
While chdir simplifies path manipulation on our end, its use falls over when PERL5LIB/@INC contains relative paths which need to be made absolute. It's fewer lines of code to get eliminate chdir usage than it is to keep using relative paths in most places.
2023-12-09*search: favor wantarray form of xap_terms
Most xap_terms callers do not benefit from the hashref return value, and we can delay hashmap use until List::Util::uniqstr if needed.
2023-12-09*search: simplify handling of Xapian term iterators
Xapian has always sorted termlist iterators, so we now: 1) break out of the iterator loop early on non-matches 2) avoid doing sorting ourselves As a result, we'll also favor the wantarray forms of xap_terms and all_terms to preserve sort order in most cases. Confirmed by the Xapian maintainer: <20231201184844.GO4059@survex.com> Link: https://lists.xapian.org/pipermail/xapian-discuss/2023-December/010013.html
2023-12-08workaround --headers bug with spamc(1)
As of SpamAssassin 4.0.0, spamc(1) corrupts messages with NUL in the body when the `--headers' switch is used. This increases transport costs, but most spamc/spamd setups are via local sockets, so it's unlikely to be significant. Link: https://bugs.debian.org/1057749 Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
2023-12-06cindex: avoid recursion on prune
There's no need to recurse and trigger deep recursion warnings when we hit a coderepo with a known hash (SHA-1 vs SHA-256). Noticed while pruning the 1200+ repos on a git.kernel.org mirror.
2023-12-05cindex: index full (40/64 char) hex blob OIDs
This future proofs the index against git auto-abbreviation needing more characters as the repo grows. It'll be useful for joining against inboxes using dfpre. As with emails, we'll continue indexing abbreviated blob OIDs down to 7 hex characters so a SHA-1 git repo will have all abbreviations of the OID from 7-39 hex characters in addition to the 40 character unabbreviated form.
2023-12-05searchidx: drop redundant decl in index_git_blob_id
Oddly, Perl did not warn about this. Spotted while confirming abbreviated OIDs are also indexed when unabbreviated OIDs appear.
2023-12-01xap_helper: enable stderr assignment on DragonFly
It looks like DragonFly inherited this from FreeBSD to allow us to save us some syscalls.
2023-12-01tests: note kevent+tmpfs failures on DragonFly <= 6.4
I forgot to set TMPDIR=/path/to/non-tmpfs again.
2023-12-01xap_helper.h: fix non-assignable stderr case
I mixed up "flush" with "close" :x Fixes: 87b7f633f241 (xap_helper: implement mset endpoint for WWW, IMAP, etc...)
2023-11-30codesearch: use retry_reopen for WWW
As with mail search, a cindex may be updated while WWW is serving requests. Thus we must reopen the Xapian DB when the revision we're using becomes stale.
2023-11-30inbox: shrink data structures for publicinbox.*.hide
We no longer vivify the intermediate $ibx->{-hide} hashref, instead we use $ibx->{-hide_$KEY} directly. This avoids an intermediate hashref and extra hash table lookups.
2023-11-30www_listing: support publicInbox.nameIsUrl
This is a convenient (and slightly memory-saving) alternative to specifying a `publicinbox.*.url' entry for every single inbox when using publicinbox.wwwListing.
2023-11-30git_async_cat: use git from "all" extindex if possible
For inboxes associated with an extindex (currently only the special "all") one, we can share the git process across all those inboxes unambiguously when retrieving full SHA-1 blobs. The comment for my proposed patch is also out-of-date as that git speedup has been a part of git since 2.33.
2023-11-30inbox: expire resources more aggressively
We no longer trigger git cleanups from the Inbox package since `git cat-file' users have their own cleanup to support git coderepos not associated with any inbox. This change means we unconditionally expire SQLite and Xapian FDs and some internal caches regardless of git activity. The old logic was irrelevant to Gcf2 (libgit2) users anyways since we couldn't determine whether or not an inbox was active based on {inflight} git requests, and upcoming changes will make it inaccurate for all extindex/cindex users as well. Opening SQLite and Xapian DBs is fairly cheap; so it's a small price to pay to reduce memory use and fragmentation.
2023-11-30cindex: speed up initial scan setup phase
This brings a no-op -cindex scan of a git.kernel.org mirror down from 70s to 10s with a hot cache on a busy machine. CPU-intensive SHA-256 fingerprinting of the `git show-ref' result can be parallelized on shard workers. Future changes can move more of the initial scan setup phase into shard workers for more parallelism. But most of the performance for skipping unchanged repos is gained from delaying the commit time reading until we've seen the fingerprint is out-of-date, since reading commit times requires a large amount of I/O compared to only reading refs for fingerprints.
2023-11-30spawn: drop IO layer support from redirects
When setting up stdin for commands, the write_file API is convenient enough nowadays to not be worth having special support with process spawning. When reading stdout of commands, we should probably be using utf8_maybe everywhere since there'll always be legacy encodings in git repos. Reading regular files with :utf8 also results in worse memory management since the file size cannot be used as a hint.
2023-11-30cindex: skip getpid guard for most OnDestroy use
We no longer fork after cidx_init, so there's no need to spend CPU cycles on the getpid() syscall, especially since it's no longer cached on glibc while syscalls are also more expensive these days due to CPU vulnerability mitigations.
2023-11-30git: share unlinked pack checking code with gcf2
It saves some code in case we keep libgit2 around.
2023-11-30cindex: store extensions.objectFormat with repo data
This will allow WWW to use a combined LeiALE-like thing to reduce git processes.