Date | Commit message (Collapse) |
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
* origin/master: (58 commits)
ds: flatten + reuse @events, epoll_wait style fixes
ds: simplify EventLoop implementation
check defined return value for localized slurp errors
import: check for git->qx errors, clearer return values
git: qx: avoid extra "local" for scalar context case
search: remove {mset} option for ->mset method
search: remove pointless {relevance} setting
miscsearch: take reopen from Search and use it
extsearch: unconditionally reopen on access
extindex: allow using --all without EXTINDEX_DIR
extindex: add undocumented --no-scan switch
extindex: enable autoflush on STDOUT/STDERR
extindex: various --watch signal handling fixes
extindex: --watch for inotify-based updates
eml: fix undefined vars on <Perl 5.28
t/config: test --get-urlmatch for git <2.26
default to CORE::warn in $SIG{__WARN__} handlers
inbox: name variable for values loop iterator
inboxidle: avoid needless syscalls on refresh
inboxidle: clue users into resolving ENOSPC from inotify
...
|
|
Since ExtSearch lacks the janky cleanup timer of
PublicInbox::Inbox objects, its search results get stale.
Reopen the Xapian DB on every ->search call for now, as
reducing reopen calls doesn't seem worth the complexity.
The Xapian::Database::reopen operation itself takes only ~50us
on my old workstation with 3 shards totaling <200GB. Other
parts of Xapian dominates the search time, so the reopen seems
inconsequential with single-digit shard counts.
|
|
Unlike inboxdir, the canonical-ness of -extindex paths is not
relevant at the moment, and may never be relevant at all. So
don't mislead others into thinking these paths being
canonicalized matters.
|
|
This reduces differences between v1 and v2 code, and
introduces ->xdb_shards_flat to provide read-only access
to shards without using Xapian::MultiDatabase. This
will allow us to combine shards of several inboxes
AND extindexes for lei.
|
|
Still unstable, this builds off the equally unstable extindex :P
This will be used for caching/memoization of traditional mail
stores (IMAP, Maildir, etc) while providing indexing via Xapian,
along with compression, and checksumming from git.
Most notably, this adds the ability to add/remove per-message
keywords (draft, seen, flagged, answered) as described in the
JMAP specification (RFC 8621 section 4.1.1).
We'll use `.' (a single period) as an $eidx_key since it's an
invalid {inboxdir} or {newsgroup} name.
|
|
Using "eidx_key:" boolean prefix to limit results to a given
inbox, we can use ->ALL to emulate and replace per-Inbox
xap15/[0-9] search indices.
With this change, the presence of "extindex.all.topdir" in the
$PI_CONFIG will cause the WWW code to use that extindex and
ignore per-inbox Xapian DBs in xap15/[0-9].
Unfortunately IMAP search still requires old per-inbox indices,
for now. Mapping extindex Xapian docids to per-Inbox UIDs and
vice-versa is proving tricky. Fortunately, IMAP search is
rarely used and optional. The RFCs don't specify expensive
phrase search, either, so `indexlevel=medium' can be used in
per-inbox Xapian indices to save space.
For primarily WWW (and future JMAP) users; this should result in
significant disk space, FD, and page cache footprint savings for
large instances with many inboxes and many cross-posted
messages.
|
|
For ->ALL users, this mitigates the regression introduced
by commit 811b8d3cbaa790f59b7b107140b86248da16499b
("nntp: xref: use ->ALL extindex if available"), since
it's common to cross post messages to some mailing
lists with per-list trailers for unsubscribe information.
We won't bother dealing with Bcc-ed messages since those
are nearly all spam when it comes to public mailing lists.
Fixes: 811b8d3cbaa790f5 ("nntp: xref: use ->ALL extindex if available")
Link: https://public-inbox.org/meta/20201130194201.GA6687@dcvr/
|
|
Getting Xref for cross-posted messages is an O(n) operation
where `n' is the number of newsgroups on the server. This works
acceptably when there are dozens of groups, but would be
unnacceptable when there's tens of thousands of newsgroups.
With ~140 newsgroups, a lore.kernel.org mirror already handles
"XHDR Xref $MESSAGE_ID" requests around 30% faster after
creating the xref3.idx_nntp index.
The SQL additions to ExtSearch.pm may be a bit strange and
seem more appropriate for Over.pm; however it currently makes
sense to me since those bits of over.sqlite3 access are
exclusive to ExtSearch and can't be used by traditional
v1/v2 inboxes...
|
|
We'll replace "$EINDEX" => "$EXTINDEX" in a user-visible
line and also some hacker-only tests.
"eindex" is no longer used because it rhymes with "reindex",
so remove the last instance of it.
Fixes: 6b0fed3b03263ba2 ("extsearch: rename -eindex to -extindex")
|
|
This will be used to index and search Inbox objects and perhaps
individual git repositories/epochs for grokmirror manifest.js.gz
generation. There is no sharding planned for this at the moment
since inbox count should remain low (~100K to 1M) compared to
message count.
Folding this into the existing sharded DBs could be possible;
but would likely increase query and maintenance costs, as well
as development complexity. So we'll use a few more inodes and
FDs at runtime, instead.
|
|
This makes `ps' output look a bit nicer if there's trailing
slashes involved from the command-line.
|
|
This lets us pretend an ExtSearch object is an Inbox object
in most of the existing WWW code.
|
|
We'll probably still need synchronous message retrieval
in a few places (tests, at least).
|
|
We can simplify callers by using $self->{xpfx} instead of
passing another arg on the stack.
|
|
This will provide a similar API to PublicInbox::Inbox for
read-only WWW, -imapd, and -nntpd interfaces.
|