Date | Commit message (Collapse) |
|
Dumping errors from the previous run can often get lost, so just
spew to syslog since it's a standard place to put errors that
don't make it to a client. Note: we don't rely on $SIG{__WARN__}
since some of the Net:: stuff will write directly to STDERR
(as will external processes).
|
|
Sharing a single lei-daemon across multiple processes still
exhibits reliability problems, and reliably checking
lei-daemon's inotify internals seems impossible without.
Even without lei-daemon sharing, "make check-run" is a few
seconds faster than "make check" for me.
|
|
On slower systems, even a 100ms delay may not be enough;
so loop and retry in hopes of an early exit for faster
systems.
|
|
This works with existing inotify/EVFILT_VNODE functionality to
propagate changes made from one Maildir to another Maildir.
I chose the lei/store worker process to handle this since
propagating changes back into lei-daemon on a massive scale
could lead to dead-locking while both processes are attempting
to write to each other. Eliminating IPC overhead is a nice
side effect, but could hurt performance if Maildirs are slow.
The code for "lei export-kw" is significantly revamped to match
the new code used in the "lei/store" daemon. It should be more
correct w.r.t. corner-cases and stale entries, but perhaps
better tests need to be written.
squashed:
t/lei-auto-watch: increase delay for FreeBSD kevent
My FreeBSD VM seems to need longer for this test than inotify
under Linux, likely because the kevent support code needs to be
more complicated.
|
|
For lei-index to work in parallel with MUA access and upcoming
inotify-based updates, mail_sync.sqlite3 needs to always be
up-to-date to read-only worker processes (ahead of everything
else). So rely on the default auto-commit behavior and hope
SQLite WAL can reduce some of the overheads involved with
writes.
|
|
While messages from removed inboxes were removed from Xapian
search, --gc failed to remove messages from over.sqlite3
entirely. They no longer show up in the topic summary view.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/20210830201723.dehoul4y6gpqf2cp@nitro.local/
|
|
Another step towards moving more of our internals to use binary
OIDs to avoid needless conversions before hitting disk.
|
|
Open file handles in lei-daemon may be unstable so we need to
account for readlink() returning undef.
|
|
This makes the mirroring and code retrieval instructions less
obstructive. Relying on WwwText means we only use our Linkify
module to make hrefs of full URLs; making relative and shortened
hrefs off-limits; hopefully this isn't too much of a problem.
coderepo information remains duplicated on every page since
(IMHO) coderepos are an important feature; but nobody besides me
has ever bothered to configure coderepos, so I suppose it's
fine...
Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210826132747.6gxuwnhftyf7c6hp@nitro.local/
|
|
Searching inboxes with an empty query no longer gives 500 errors
due to Xapian. Also, improve the error message when no inboxes
match, since saying no inboxes exist yet is wrong.
|
|
It's a special case and we can show it in the HTML display
without affecting manifest.js.gz generation.
|
|
Since we favor ->over in WWW and IMAP, move this method to
->over to reduce open files in common cases.
This fixes the /$EXTINDEX_NAME/all.mbox.gz endpoint for extindex
entries (which may get expensive...).
|
|
extindex doesn't use the same config stuff as normal
"publicinbox" entries, so we'll need a separate function
for them.
|
|
The only place where we could return wide characters with -httpd
was the raw $INBOX_DIR/description text, which is now converted
to octets.
All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode,
so length() and bytes::length() are equivalent on reads. For
socket writes, any non-octet data would warn about wide characters
and we are strict in warnings with test_httpd.
All gzipped buffers are also octets, as is PublicInbox::Eml->body,
and anything from PerlIO objects ("git cat-file --batch" output,
filesystems), so bytes::length was unnecessary in all those places.
|
|
Since "lei up" is more often useful than not and incurs neglible
overhead; enable --save by default and allow --no-save to work.
This also fixes a long-standing when overwriting --output
destinations with saved searches: dedupe data from previous
searches are reset and no longer influences the new (changed)
search, so results no longer go missing if two sequential
invocations of "lei q --save" point to the same --output.
|
|
This declutters the topic view since these links seem rarely
used. Atom and mbox.gz links probably make most sense when
users have read the HTML and decide the topic is worth following
or downloading.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210816154444.sj3ks2sikq3x2ywx@nitro.local/
|
|
Persistent lei-daemon still leads to ECONNRESET client errors on
FreeBSD, and maxing out the kern.ipc.soacceptqueue sysctl (as
documented in the FreeBSD listen(2) manpage) doesn't seem to
help.
"make check-run" is still 4-5s faster than "make check" on my
FreeBSD VM even after this change, so it's still a worthwhile
improvement.
|
|
As documented, File::Spec->canonpath does not canonicalize
"/../". While we want to do our best to preserve symlinks in
pathnames, leaving "/../" can mislead our inotify|kqueue usage.
|
|
Storing relative paths with '..' in them can be expensive to
resolve when running 'lei up', so prefer storing canonicalized
absolute paths. We only do this for paths with '..' in them,
though, since this can lose symlink info.
|
|
None of our code elsewhere accounts for non-*nix pathnames and
it's not worth our time to start. So stop wasting CPU cycles
giving the illusion that we'd care about non-*nix pathnames.
|
|
We still support usage without Xapian, so ensure our tests
work when Xapian bindings are missing
|
|
For users using the native TLS functionality of -httpd (instead
of using nginx + Plack::Middleware::ReverseProxy),
psgi.url_scheme=http was wrong and would lead to improper
redirects.
|
|
Boost relies on knowledge of all inboxes in a given config file
to work properly. So while we support indexing a subset of
inboxes, we must still account for boost in inboxes we're not
indexing. So split internal inbox groups into "known" and
"active", where previously we only cared for inboxes which were
being actively indexed.
Furthermore, boost checks need to be applied when a
message arrives in different inboxes across multiple
invocations.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210802204058.vscbxs5q7xyolyu2@nitro.local/
|
|
Since extindex uses Xapian shards in a similar way to
v2 inboxes, we'll support -xcpdb (reshard+upgrade) and
-compact all the same to give admins tuning+upgrade
options.
|
|
This wasn't wired up properly, but Xapian appears to suffer from
I/O amplification problems as DB shards get larger:
https://lists.xapian.org/pipermail/xapian-discuss/2019-February/009727.html
<23640.32170.703368.841021@y.dockes.com>
Of course, we shouldn't have too many shards, either; because
performance problems with too many shards was the entire reason
extindex was created:
https://lists.xapian.org/pipermail/xapian-discuss/2020-August/009823.html
<20200826064728.GA32239@dcvr>
|
|
On single CPU (and overloaded SMP) systems, we can't rely on
inotify in lei-daemon firing before a "lei note-event done"
client hits it. So force in a single tick() to ensure the
scheduler can yield to lei-daemon and see the inotify wakeup
before "lei note-event done" to commit the write.
|
|
Favor oidbin use internally to reduce internal memory traffic.
|
|
Pretty trivial since it just invokes "git-config". It's mainly
intended to make shell completion easier.
|
|
I just hit an unreproducible failure in t/lei-p2q.t and
lacked $lei_err information to diagnose it. Hopefully
this helps track down odd failures in the future.
|
|
I hit a test failure here, but haven't been able to reproduce
it...
|
|
This makes behavior less surprising on restarts as we no longer
lose state on restarts, so there's no need to manually run "lei
add-watch" to re-enable watches. This also allows us to
transparently handle changes if somebody edits the lei config
file directly or via git-config(1).
|
|
This allows lei to automatically note keyword (message flag)
changes made to a Maildir and propagate it into lei/store:
lei add-watch --state=tag-ro /path/to/Maildir
This doesn't persist across restarts, yet. In the future,
it will be applied automatically to "lei q" output Maildirs
by default (with an option to disable it).
State values of tag-rw, index-<ro|rw>, import-<ro|rw> will all
be supported for Maildir.
This represents a fairly major internal change that's fairly
intrusive, but the whole daemon-oriented design was to
facilitate being able to automatically monitor (and propagate)
Maildir/IMAP flag changes.
|
|
This won't blindly append identical key=values, but
allows specifying multiple, different key=value pairs
as long as the values are different.
|
|
This behaves identically the lei external "boost" parameter in
prioritizing raw messages for extindex.
Relying exclusively on the config file order doesn't work well
for mirrors since it's impossible to guarantee config file
ordering via grokmirror hooks.
Config file ordering remains the default if boost is
unconfigured, or in case of ties.
Note: I chose the name "boost" rather than "priority" or "rank"
since I always get confused by whether higher or lower numbers
take precedence when it comes to kernel scheduling. "weight" is
also a part of Xapian API terminology, which we currently do not
expose to configuration (but may in the future).
|
|
Since we require separate PublicInbox::HTTPD instances for each
listen socket address (in order to support {SERVER_<NAME|PORT>}
for PSGI env), the old cache needed to be invalidated on rare
app refreshes.
SIGHUP has always been broken in -httpd (but not -imapd or
-nntpd) due to this cache.
Update the daemon documentation and 5.10.1-ize some bits while
we're in the area.
|
|
This is intended to fix older indices that had deduplication
bugs for matching content. It'll also make dealing with
future changes to ContentHash easier since that's never
guaranteed stable.
It also supports --dry-run to print changes only without
making them.
|
|
ManifestJsGz->response was not invoking the new "url_filter"
method properly. Furthermore, fix url_filter for returning 404
responses.
Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87fsx3128a.fsf@kyleam.com/
Fixes: 520be116e8a686cb ("www_listing: start updating for pagination + search")
|
|
Sometimes users (or bots) may lead queries with '&' and
trigger uninitialized variable warnings, just ignore them
and give consumers a $ctx->{qp}->{''} entry.
While we're in the area, pass a regexp rather than scalar string
to the `split' perlop to prevent Perl from recompiling the
regexp on every call.
|
|
This allows us to simplify callers throughout, and exceptions are
can no longer be silently hidden. MiscSearch now uses xap_terms
for looking up eidx_key terms for a code reduction.
We also simplify LeiStore->_msg_kw for runtime use by moving the
MsetIterator handling into t/lei_store.t test case.
|
|
Version 4.0 of highlight has renamed the "make" language to
"makefile". So just check the string starts with "make", to handle
both 3.x and 4.x.
I tested that public-inbox does actually work with highlight 4 -- it
can highlight my Makefile fine. :)
|
|
Not 100% sure what's going on, here...
|
|
Since users can't set IMAP flags in read-only IMAP folders,
we won't clobber local flags when importing from IMAP. This
also enables the local_blob fallback used for lei-index to
be used for index deduplication.
|
|
This will eventually be useful for supporting inotify watches
on Maildir. It will also allow users to script their own FS
watchers more easily.
|
|
Leftover while writing the test.
|
|
While other tools can provide the same functionality, having
integration with git-credential is convenient, here. Caching
and completion will be implemented separately.
|
|
This broke recently and lacked an automated test, so rely on
EDITOR=cat to ensure we have some coverage.
Fixes: d2670108f71b1eff ("pkt_op: make pkt_do an OO method")
|
|
On a 4-core CPU, this speeds up "lei import" on a largish
Maildir inbox with 75K messages from ~8 minutes down to ~40s.
Parallelizing alone did not bring any improvement and may
even hurt performance slightly, depending on CPU availability.
However, creating the index on the "fid" and "name" columns in
blob2name yields us the same speedup we got.
Parallelizing IMAP makes more sense due to the fact most IMAP
stores are non-local and subject to network latency.
Followup-to: bdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 ("lei import: speed up kw updates for old IMAP messages")
|
|
This adds implicit stdin suppport for p2q and lcat,
while rm and rediff no longer need explicit support
for it.
|
|
Requiring UIDVALIDITY on the command-line is of course
unreasonable.
|
|
lcat can now dump the memoized contents of entire IMAP folders,
not just a single UID. It's now parallelized and pipelined for
multiple lei2mail workers.
Furthemore, various forms of JSON output work consistently
with blob-only output, now.
While working on this, I noticed NetReader was passing UID URLs
to imap_each callbacks, which was causing mail_sync.sqlite3 to
store UIDs in `folders' and clearly wrong so it's now fixed.
|