Date | Commit message (Collapse) |
|
The return value of art_lookup changed but this command wasn't
updated since it wasn't tested.
Fixes: 0e6ceff37fc38f28 ("nntp: support slow blob retrievals")
|
|
v?fork failures seems to be the cause of locks not getting
released in -watch. Ensure lock release doesn't get skipped
in ->done for both v1 and v2 inboxes. We also need to do
everything we can to ensure DB handles, pipes and processes
get released even in the face of failure.
While we're at it, make failures around `git update-server-info'
non-fatal, since smart HTTP seems more popular anyways.
v2 changes:
- spawn: show failing command
- ensure waitpid is synchronous for inotify events
- teardown all fast-import processes on exception,
not just the failing one
- beef up lock_release error handling
- release lock on fast-import spawn failure
|
|
Although the ->async_next method does not take $self as
a receiver, but rather a PublicInbox::HTTP object, we may
still retrieve it to be called with the HTTP object via
UNIVERSAL->can.
|
|
This ought to be useful for diagnosing bugs in -watch.
|
|
The temporary clone starts as large as the full msgmap
and deletes will write to it randomly. So ensure it
doesn't get fragmented and slower as time goes on.
|
|
The grep call in list_match_domain_i returns true for all inboxes,
even ones without a URL that matches the regular expression, because
the qr value passed to grep is not surrounded by slashes. Add them.
Fixes: 1988d730c0088e8b (config: support multi-value inbox.*.*url)
|
|
It's possible for ~/.public-inbox/ to not exist if PI_CONFIG
points to an alternate location. Only noticed from the previous
patch fixing t/init.t behavior.
|
|
This gives an opportunity for users already suffering from CoW
fragmentation to at least get the Xapian DBs off CoW. Aside
from over.sqlite3 in v1, the SQLite DBs remain untouched; though
VACUUM support may come in the future.
|
|
SQLite and Xapian files are written randomly, thus they become
fragmented under btrfs with copy-on-write. This leads to
noticeable performance problems (and probably ENOSPC) as these
files get big.
lore/git (v2, <1GB) indexes around 20% faster with this on an
ancient SSD. lore/lkml seems to be taking forever and I'll
probably cancel it to save wear on my SSD.
Unfortunately, disabling CoW also means disabling checksumming
(and compression), so we'll be careful to only set the No_COW
attribute on regeneratable data. We want to keep CoW (and
checksums+compression) on git storage because current ref
storage is neither checksummed nor compressed, and git streams
pack output.
|
|
Otherwise, a user is more likely to remove the msgmap-XXXXXXXX
SQLite file from $TMPDIR and cause SQLite to error out.
|
|
This seems to speed up --reindex on smallish v2 inboxes by about
30% on both HDD and SSD. lore/git (~1GB) on an SSD even gives a
30% improvement with 3 shards. I'm only seeing a ~4% speedup on
LKML with a SATA SSD (which is difficult to repeat because it
takes around 4 hours).
Testing LKML on an HDD will take much more time...
|
|
We can keep the git process more active by sending another
request to it while fetch_run_ops() is running. This
parallelization speeds up mutt's initial FETCH for headers by
around ~35%(!).
|
|
And -compact supports --jobs=0 like -index to disable parallel
execution. Running three xapian-compact processes in parallel
on a USB 2.0 HDD is pretty painful.
|
|
We still need to use SQL_BLOB to ensure existing versions of
public-inbox can read over.sqlite3 because they're still using
{sqlite_unicode}. This partially reverts commit
e9fc1290ead44e06d20ff58e0a6acb5306d4fbe2.
Fixes: e9fc1290ead44e06 ("over: unset sqlite_unicode attribute")
|
|
There's no reason for {unindexed} to persist beyond
an ->index_sync call.
|
|
Another closure gone, and we may be able to share more
code with v2 in upcoming commits.
|
|
This allows v1 indexing to run while the `cat-file --batch-check'
process is waiting on high-latency storage.
|
|
Another step in making v1 and v2 more similar.
|
|
This allows us to speed up indexing operations to SQLite
and Xapian.
Unfortunately, it doesn't affect operations using
`xapian-compact' and the compactor API, since that doesn't seem
to support Xapian::DB_NO_SYNC, yet.
|
|
We'll switch to using IdxStack here to ensure we get repeatable
results and ascending THREADIDs according to git chronology.
This means we'll need a two-pass reindex to index existing
messages before indexing new messages.
Since we no longer have a long-lived git-log process, we don't
have to worry about old Xapian referencing the git-log pipe
w/o FD_CLOEXEC, either.
|
|
The "xdb" prefix was inaccurate since it's used by
indexlevel=basic, which is Xapian-free. The '_' (underscore)
prefix was also wrong for a method which is called across
package boundaries.
|
|
This was a bug, but I'm not sure where it matters, yet, but it
may matter in the future.
|
|
|
|
Since normal per-epoch indexing no longer holds a "git log"
process open, we don't need to worry about not sharing the
pipe with forked shards when we restart the indexer.
While we're in the area, better describe what `unindex' does,
since it's a rarely-used but necessary code path.
|
|
We can reduce the number of parameters we pass around on stack
and make our read-write and read-only code paths more uniform.
|
|
Instead, storing {xdir} will allow us to avoid string
concatenation in the read-only path and save us a little
hash entry space.
|
|
This is a step which makes our use of abbreviations more
consistent when referring to PublicInbox::Inbox objects.
We'll also be reducing the number of redundant fields
in SearchIdx and V2Writable code paths to make the
object graph easier-to-follow.
|
|
It'll be one continuous range with IdxStack.
|
|
Another step in slowly updating our code to support SHA-256 or
whatever other hash algorithms git may support in the future.
|
|
The V2Writable object may be long-lived, so it makes more
sense to put the {autime} and {cotime} fields into the
shorter-lived index_sync state.
|
|
Instead of doing fill_alternates for every epoch we're indexing,
just do it once at the start of index_sync invocation. This
will set us up for using a single "git cat-file" process for
indexing multiple epochs.
|
|
This avoids pinning a potentially large chunk of memory from
`git-log --reverse' into RAM (or triggering less predictable
swap behavior). Instead it uses a contiguous temporary file
with a fixed-size record for every blob we'll need to index.
|
|
Since we'll need to expose THREADID to JMAP and IMAP users,
index all messages in the order they were committed to ensure
our `tid' (thread ID) column ascends in mirrors the same way
they do in the source inbox.
This drastically simplifies our code but increases memory
usage of `git-log'. The next commit will bring memory use
back down at the expense of $TMPDIR usage.
|
|
Older versions of public-inbox < 1.3.0 had subtly
different semantics around threading in some corner
cases. This switch (when combined with --reindex)
allows us to fix them by regenerating associations.
|
|
Noticed while reindexing a largish v2 inbox in parallel on an
SSD which required checkpointing and respawning shard workers.
Fixes: f06e84220e5566e7 ("over+msgmap: do not store filename after DBI->connect")
|
|
We can rely on FD_CLOEXEC being set by default (since Perl 5.6+)
on pipes to avoid FS/page-cache traffic, here. We also know
"git hash-object" won't output anything until it's consumed all
of its standard input; so there's no danger of a deadlock even
in the the unlikely case git uses a hash that can't fit into
PIPE_BUF :P
|
|
Since over.sqlite3 seems here to stay, we no longer need to do
Message-ID lookups against Xapian and can simply rely on the
docid <=> NNTP article number equivalancy SCHEMA_VERSION=15
gave us.
This rids us of the closure-using batch_do sub in the v1
code path and vastly simplifies both v1 and v2 unindexing.
|
|
Prefer "parent" to "base" since the former is lighter and part
of Perl 5.10+. We'll also rely on warnings from "-w" globally
(or not) instead of via "use".
|
|
OO method dispatch was 10-15% slower when I was implementing the
NNTP server. It also serves as a helpful reminder to the reader
at the callsite as to whether a sub is likely in the same
package as the caller or not.
|
|
This saves runtime allocations and reduces the likelyhood of
memory leaks either from cycles or buggy old Perl versions.
|
|
While it makes the code flow slightly less well in some places,
it saves us runtime allocations and indentation.
|
|
In case this ends up in the same process as Mbox::msg_hdr,
it can reduce memory use by sharing the cache key in
PublicInbox::Eml::re_memo
|
|
We only support Unix-like platforms where binmode (":raw") is
the default anyways, and v5.10 semantics means it won't do
unicode_strings (unlike v5.12). So save some lines of code.
|
|
The "5.010_001" form was for Perl 5.6, which I doubt anybody
would attempt; so favor "v5.10.1" as it is more readable to
humans. Prefer "parent" to "base" since the former is lighter.
We'll also rely on warnings from "-w" globally (or not) instead
of via "use".
We'll also update "use" statements to reflect what's actually
used by V2Writable.
|
|
"\n" and other characters requiring quoting and/or escaping in
in $GIT_DIR/objects/info/alternates was not supported in git 2.11
and earlier; nor does it seem supported at all in libgit2.
This will allow us to support sharing git-cat-file or similar
endpoints across multiple inboxes via alternates.
This breaks an existing use case for anybody wacky
enough to put `\n' in the `inboxdir' pathname; but I doubt
this affects anybody.
|
|
SQLite already knows the filename internally, so avoid having it
as a long-lived Perl SV to save some bytes when there's many
inboxes and open DBs.
|
|
While it's even less common to experience a replaced
msgmap.sqlite3 file, BOFHs may do the darndest things. This is
another step towards reducing the number of needless wakeups
we need to do in long-lived read-only daemons.
|
|
None of the human-readable strings stored in over.sqlite3
require UTF-8. Message-IDs do not, nor do the compressed
Subject IDs (sid) we use for Subject-based threading. And the
`ddd' (doc-data-deflated) column is of course binary data.
This frees us of having to use SQL_BLOB for the `ddd', column,
and will open the door for us to use dbh_new for Msgmap, too.
|
|
We must not trigger wakeups on InboxIdle users until after we've
renamed all files into place. Otherwise, the InboxIdle caller
may just reopen the old (soon-to-be-unlinked) file.
This fixes occasional test failures in t/nntpd.t
Fixes: f977826a17f8735e ("lock: reduce inotify wakeups")
|
|
Instead of returning "BAD program fault", just give the
standard "BAD search not available"... message we show
for mailbox slices.
|