Date | Commit message (Collapse) |
|
This cuts down on code somewhat (before I add more :x)
|
|
For totally bogus things in address fields, we'll fall back to
showing the original entry in the name column when using
Email::Address::XS.
The pure Perl version differs here, but we'll just let them be
different when it comes to handling bogus data.
|
|
This makes it easier to discover contemporary messages
crossposted to other groups within the same WWW instance.
The internal cache is necessary for giant threads, and the
expiry mechanism is necessary to prevent attackers from
trivially OOM-ing.
|
|
This will make it more effective for use as a cache key.
I'm not entirely happy with this sub being in the Git module
since it's used by lei and command-line tools, but that's
for another day to deal with...
|
|
I noticed this when I wrote a new (but probably unnecessary) *.t
test and `make check-run' failed since I omitted the final
semi-colon after `done_testing'.
|
|
I'm not sure how this happens (perl 5.34.1 on FreeBSD 13.2)
but it appears the {sock} check can succeed and then go undef
and become unable to call ->owner_pid.
This happens when libgit2 is in use, so perhaps that's a factor.
In any case, the rest of the tests succeed.
|
|
This should help us deal with MH sequence number packing and
invalidating mail_sync.sqlite3.
|
|
For thread skeletons with multiple roots, it makes sense to
note the strict|loose delineation even when the first message
matches the desired Message-ID.
|
|
When retrieving loose (Subject) matches for a thread, we wanted
the most recent matches in reverse chronological order.
However, when displaying the /T/ endpoint generating the thread
skeleton, we prefer ascending chronological order to match the
flow of the conversation.
Reported-by: Askar Safin <safinaskar@gmail.com>
Link: https://public-inbox.org/meta/CAPnZJGAqsh8ZhPaCAy5M2NZVNcWrr_Hr94t32VXiyiTXwD9jRQ@mail.gmail.com/
|
|
The MH format is widely-supported and used by various MUAs such
as mutt and sylpheed, and a MH-like format is used by mlmmj for
archives, as well. Locking implementations for writes are
inconsistent, so this commit doesn't support writes, yet.
inotify|EVFILT_VNODE watches aren't supported, yet, but that'll
have to come since MH allows packing unused integers and
renaming files.
|
|
This is a step towards improving the out-of-the-box experience
in achieving notifications without XS, extra downloads, and .so
loading + runtime mmap overhead.
This also fixes loongarch support of all Linux syscalls due to
a bad regexp :x
All the reachable Linux architectures listed at
<https://portal.cfarm.net/machines/list/> should be supported.
At the moment, there appears to be no reachable sparc* Linux
machines available to cfarm users.
Fixes: b0e5093aa3572a86 (syscall: add support for riscv64, 2022-08-11)
|
|
I noticed this bug while developing another feature and tests
were getting SIGHUP (since SIGHUP == 1 on most systems).
|
|
`lei index' should be capable of indexing the the same way
`lei import' does, but without the indexing. I only noticed
this omission while developing a new feature.
|
|
This fixes t/mda.t with git 1.8.5
|
|
Older versions of git lack --batch-all-objects, and 2.6+ is
new enough already since v2, lei, etc all depend on it.
|
|
CentOS 7.x ships with git 1.8.5, so unless a CentOS 7.x user
enables 3rd-party repos[1], they'll be stuck with a version
of git without `--stable' (though I'm becoming skeptical of
indexing patchids at all).
[1] https://public-inbox.org/meta/20210421151308.yz5hzkgm75klunpe@nitro.local/
|
|
Test::More distributed with Perl 5.16.3 on CentOS 7.x expects
the `$how_many' argument for `skip' and warns when its
uninitialized, so quiet that warning down.
|
|
While it's not in a code path intended WwwCoderepo and RepoAtom,
those classes provide their own ->zflush, this can future-proof
our code against future subclasses at a minor performance cost.
|
|
Our read buffering only worked well with the stdout buffering on
glibc and *BSD libc, but not musl. When reading the stdout of
git(1), we are likely to get smaller buffers and require more
reads on musl-based systems (tested Alpine Linux 3.19.0).
Thus we must prevent ->translate from being called with an empty
argument list (denoting EOF). We'll also avoid some local
variable assignments while at it and favor the non-OO ->zflush
dispatch inside RepoAtom and WwwCoderepo subclasses.
|
|
And use it in convert-compact.t This gives us nicer errors for
debugging a problem I noticed on Alpine Linux (tested 3.19.0)
|
|
This makes the C++ build work on Alpine Linux (tested 3.19.0)
without having to install g++ to get the `c++' executable.
I've tested this change with and without g++ on Alpine so it'll
continue to work if a user decides to install g++.
This should continue to work if the Xapian package on Alpine is
changed to link against libc++ instead of libstdc++, since we
only add `-lstdc++' as a fallback. For reference, Xapian is
already linked against libc++ and not libstdc++ on FreeBSD 13.x
|
|
We don't actually need Inline::C support to build a standalone
executable implemented in C++.
|
|
The musl strftime(3) implementation on AlpineLinux 3.19.0
doesn't support `%k' and `%k' isn't in POSIX, either. So we
fall back to using the `sprintf' perlop in the user-facing UI
since leading zeroes require needless overhead for my eyes and
brain to parse in the time.
|
|
`lei inspect' uses the `iso8601' sub from LeiOverview.
|
|
BusyBox lsof(1) ignores the `-p PID' argument and shows
the open files for every process it knows about. BusyBox
lsof also lacks the `NODE' column of the non-BusyBox
implementation, so we'll rely on /proc/PID/fd/ in those
cases since the deleted file checks are Linux-only and
it's common to have procfs is mounted on /proc on Linux.
|
|
Alpine Linux ships git-http-backend in the `git-daemon'
package separately from `git', so we must test for its
existence before attempting to test functionality which
depends on it.
|
|
There are many Linux (GNU or otherwise) which do not have
strace(1) installed.
|
|
Our pure-Perl (PublicInbox::AddressPP) fallback is closer to the
preferred Email::Address::XS (EAX) behavior than Mail::Address
is for ->name support. EAX tends to be overkill with good spam
filtering, and using our own fallback means life is easier for
users with neither C/XS build tools nor a pre-built EAX package.
|
|
Post-image blob OIDs are what solver already works with, and
longer OIDs may not be available in historical mail archives.
`patchid' turns out to be unsuitable since:
1) git's default diff algorithm has changed over time
2) users may use different diff options to improve readability
Of course, we could eventually run `lei rediff' during the index
phase to regenerate patchids, but that's out-of-scope for now
and likely to be too expensive.
|
|
This will allow us to use p2q-compatible specifications such as
"dfpost7" to only capture blob OIDs which are 7 characters in
length (the indexer will always index down to 7 characters)
|
|
While chdir simplifies path manipulation on our end, its use
falls over when PERL5LIB/@INC contains relative paths which need
to be made absolute. It's fewer lines of code to get eliminate
chdir usage than it is to keep using relative paths in most
places.
|
|
Most xap_terms callers do not benefit from the hashref
return value, and we can delay hashmap use until
List::Util::uniqstr if needed.
|
|
Xapian has always sorted termlist iterators, so we now:
1) break out of the iterator loop early on non-matches
2) avoid doing sorting ourselves
As a result, we'll also favor the wantarray forms of xap_terms
and all_terms to preserve sort order in most cases.
Confirmed by the Xapian maintainer: <20231201184844.GO4059@survex.com>
Link: https://lists.xapian.org/pipermail/xapian-discuss/2023-December/010013.html
|
|
As of SpamAssassin 4.0.0, spamc(1) corrupts messages with NUL in
the body when the `--headers' switch is used. This increases
transport costs, but most spamc/spamd setups are via local
sockets, so it's unlikely to be significant.
Link: https://bugs.debian.org/1057749
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
There's no need to recurse and trigger deep recursion warnings
when we hit a coderepo with a known hash (SHA-1 vs SHA-256).
Noticed while pruning the 1200+ repos on a git.kernel.org
mirror.
|
|
This future proofs the index against git auto-abbreviation
needing more characters as the repo grows. It'll be useful for
joining against inboxes using dfpre.
As with emails, we'll continue indexing abbreviated blob OIDs
down to 7 hex characters so a SHA-1 git repo will have all
abbreviations of the OID from 7-39 hex characters in addition
to the 40 character unabbreviated form.
|
|
Oddly, Perl did not warn about this. Spotted while confirming
abbreviated OIDs are also indexed when unabbreviated OIDs
appear.
|
|
It looks like DragonFly inherited this from FreeBSD to
allow us to save us some syscalls.
|
|
I forgot to set TMPDIR=/path/to/non-tmpfs again.
|
|
I mixed up "flush" with "close" :x
Fixes: 87b7f633f241 (xap_helper: implement mset endpoint for WWW, IMAP, etc...)
|
|
As with mail search, a cindex may be updated while WWW is
serving requests. Thus we must reopen the Xapian DB when
the revision we're using becomes stale.
|
|
We no longer vivify the intermediate $ibx->{-hide} hashref,
instead we use $ibx->{-hide_$KEY} directly. This avoids
an intermediate hashref and extra hash table lookups.
|
|
This is a convenient (and slightly memory-saving) alternative to
specifying a `publicinbox.*.url' entry for every single inbox
when using publicinbox.wwwListing.
|
|
For inboxes associated with an extindex (currently only the
special "all") one, we can share the git process across
all those inboxes unambiguously when retrieving full SHA-1
blobs.
The comment for my proposed patch is also out-of-date as that
git speedup has been a part of git since 2.33.
|
|
We no longer trigger git cleanups from the Inbox package since
`git cat-file' users have their own cleanup to support git
coderepos not associated with any inbox.
This change means we unconditionally expire SQLite and Xapian
FDs and some internal caches regardless of git activity. The
old logic was irrelevant to Gcf2 (libgit2) users anyways since
we couldn't determine whether or not an inbox was active based
on {inflight} git requests, and upcoming changes will make it
inaccurate for all extindex/cindex users as well.
Opening SQLite and Xapian DBs is fairly cheap; so it's a small
price to pay to reduce memory use and fragmentation.
|
|
This brings a no-op -cindex scan of a git.kernel.org mirror
down from 70s to 10s with a hot cache on a busy machine.
CPU-intensive SHA-256 fingerprinting of the `git show-ref'
result can be parallelized on shard workers. Future changes can
move more of the initial scan setup phase into shard workers for
more parallelism.
But most of the performance for skipping unchanged repos is
gained from delaying the commit time reading until we've seen
the fingerprint is out-of-date, since reading commit times
requires a large amount of I/O compared to only reading refs
for fingerprints.
|
|
When setting up stdin for commands, the write_file API is
convenient enough nowadays to not be worth having special
support with process spawning.
When reading stdout of commands, we should probably be using
utf8_maybe everywhere since there'll always be legacy encodings
in git repos.
Reading regular files with :utf8 also results in worse memory
management since the file size cannot be used as a hint.
|
|
We no longer fork after cidx_init, so there's no need to spend
CPU cycles on the getpid() syscall, especially since it's no
longer cached on glibc while syscalls are also more expensive
these days due to CPU vulnerability mitigations.
|
|
It saves some code in case we keep libgit2 around.
|
|
This will allow WWW to use a combined LeiALE-like
thing to reduce git processes.
|