about summary refs log tree commit homepage
path: root/t/v2writable.t
DateCommit message (Collapse)
2021-09-15multi_git: hoist out common epoch/alternates handling
IMHO, this greatly improves code sharing and organization between v2, extindex, and lei/store. Common git-related logic for these is lightly-refactored and easier to reason about. The impetus for this big change was to ensure inboxes created+managed by public-inbox-{clone,fetch} could have alternates and configs setup properly without depending on SQLite (via V2Writable). This change does that while making old code shorter and better factored.
2021-03-15t/*: disable fsync on tests were create_inbox isn't worth it
Using create_inbox doesn't seem worth the trouble, here, at the moment, but disabling fsync(2) gives a noticeable speedup on my system even with an SSD.
2021-02-08tests: favor IPv6
IPv4 gets plenty of real-world coverage, and apparently there's Debian buildd hosts which lack IPv4(*). So ensure everything can work on IPv6 and not cause problems for odd setups. (*) https://bugs.debian.org/979432
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-11-07t/v2writable: remove pointless ->barrier call
We don't actually use it anywhere, and may not need it in the future.
2020-09-16treewide: relax allow >=40 chars for git OID
This will help with eventual git SHA-256 transitions.
2020-09-03search: replace ->query with ->mset
Nearly all of the search uses in the production code rely on a Xapian mset iterator being returned (instead of an array of $smsg objects). So default to returning the mset and move the burden of smsg array conversion into the test cases.
2020-09-03use more idiomatic internal API for ->over access
{over_ro} being a part of the Search object is a historical oddity which will go away, soon. Lets start removing its use in tests and rarely-used helper scripts.
2020-08-27over: recent: remove expensive COUNT query
As noted in commit 87dca6d8d5988c5eb54019cca342450b0b7dd6b7 ("www: rework query responses to avoid COUNT in SQLite"), COUNT on many rows is expensive on big SQLite DBs. We've already stopped using that code path long ago in WWW while -imapd and -nntpd never used it. So we'll adjust our remaining test cases to not need it, either.
2020-06-03smsg: remove remaining accessor methods
We'll continue to favor simpler data models that can be used directly rather than wasting time and memory with accessor APIs. The ->from, ->to, -cc, ->mid, ->subject, >references methods can all be trivially replaced by hash lookups since all their values are stored in doc_data. Most remaining callers of those methods were test cases, anyways. ->from_name is only used in the PSGI code, so we can just use ->psgi_cull to take care of populating the {from_name} field.
2020-05-12rename "ContentId" to "ContentHash"
The old name may be confused with "Content-ID" as described in RFC 2392, so use an alternate name to avoid confusing future readers.
2020-05-09remove most internal Email::MIME usage
We no longer load or use Email::MIME outside of comparison tests.
2020-04-22t/*.t: reduce dependency on Email::MIME APIs
Instead, favor PublicInbox::MIME->new for non-attachment emails. We may support alternatives to Email::MIME down the line. We'll still keep Email::MIME->create to deal with attachments, for now, but there's also a fair amount of test duplication we should eliminate, later.
2020-04-22t/*.t: use Email::MIME->create over PublicInbox::MIME->create
PublicInbox::MIME only supports ->new, and is only different from Email::MIME for old versions of Email::MIME. In the future, PublicInbox::MIME may not be a subclass of Email::MIME at all.
2020-04-20testcommon: spawn-aware system() and qx[] workalikes
Barely noticeable on Linux, but this gives a 1-2% speedup on a FreeBSD 11.3 VM and lets us use built-in redirects rather than relying on /bin/sh.
2020-03-31v2writable: index Message-IDs w/ spaces properly
Message-IDs can apparently contain spaces and other weird characters. Ensure we pass those properly to shard subprocesses when importing messages in parallel mode. Our NNTP request parser does not deal with spaces in the Message-ID, yet, and I don't expect most NNTP clients to, either. Nor does the Net::NNTP client handle them in responses.
2020-02-24v2writable: make remove return-compatible w/ Import::remove
Import::remove is a documented interface, and the return value of the V2Writable work-alike should try to be compatible with what Import implements.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-28v2writable: newest epochs go first in alternates
New epochs are the most likely to have loose objects. git won't be able to take advantage of pack indices and needs to scan every alternate for the loose object via open/openat syscalls. Those syscalls will add up some day when we've got hundreds or thousands of epochs.
2019-12-24testcommon: add require_mods method and use it
This cuts down on lines of code in individual test cases and fixes some misnamed error messages by using "$0" consistently. This will also provide us with a method of swapping out dependencies which provide equivalent functionality (e.g "Xapian" SWIG can replace "Search::Xapian" XS bindings).
2019-12-19tests: move t/common.perl to PublicInbox::TestCommon
We want to be able to use run_script with *.t files, so t/common.perl putting subs into the top-level "main" namespace won't work. Instead, make it a module which uses Exporter like other libraries.
2019-11-24tests: quiet down commit graph
Newer versions of git enable the commit graph by default. Since we blow away our temporary directories every test, generating graphis is a waste and clutters stderr with "Computing commit graph generation numbers" messages.
2019-11-24tests: use File::Temp->newdir instead of tempdir()
We'll also introduce a tmpdir() API to give tempdirs consistent names.
2019-11-24t/common: start_script replaces spawn_listener
We can shave several hundred milliseconds off tests which spawn daemons by preloading and avoiding startup time for common modules which are already loaded in the parent process. This also gives ENV{TAIL} support to all tests which support daemons which log to stdout/stderr.
2019-11-08t/*.t: disable nntpd/httpd worker processes in most tests
And explicitly test for respawning in t/httpd-corner.t There's no need to have an extra entries in the process table for most tests we run, since that's not what we're testing.
2019-10-30Merge branch 'learn'
* learn: doc: add public-inbox-learn(1) manpage mda: support multiple List-ID matches mda: prepare for multiple destinations inboxwritable: add assert_usable_dir sub mda: skip MIME parsing if spam mda: hoist out mda_filter_adjust filter/base: remove MAX_MID_SIZE constant mda: hoist out List-ID handling and reuse in -learn learn: hoist out remove_or_add subroutine learn: GIT_COMMITTER_<NAME|EMAIL> may be "" or "0" learn: update usage statement learn: only map recipient list on "ham" or "rm" learn: support multiple To/Cc headers
2019-10-30inboxwritable: add assert_usable_dir sub
And use it for mda, since "0" could be a usable directory if somebody insists on using relative paths...
2019-10-28index: allow search/lookups on X-Alt-Message-ID
Since we replace extra Message-ID headers with X-Alt-Message-ID to placate NNTP clients, we should allow searching and indexing on X-Alt-Message-ID just like we do with Message-ID.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-09-09run update-copyrights from gnulib for 2019
2019-09-09tests: add tcp_connect() helper
IO::Socket::INET->new is rather verbose with the options hash, extract it into a standalone sub
2019-06-30tests: common tcp_server and unix_server helpers
IO::Socket:*->new options are verbose and we can save a bunch of code by putting this into t/common.perl, since the related spawn_listener stuff is already there.
2019-06-24msgmap: mid_insert: use plain "INSERT" to detect duplicates
"INSERT OR IGNORE" still bumps the auto-increment counter in SQLite, which causes gaps to appear in NNTP article numbering. This bug appeared in v2 repos where V2Writable may call ->add repeatedly on the same message. This bug is apparent with public-inbox-watch and work-in-progress IMAP watchers which may rescan and (attempt to) reinsert the same message on mailbox changes. Most uses of public-inbox-mda were not affected, unless the same message is actually delivered multiple times to the mda. v1 is not affected, either, since deduplication is only based on Message-ID and msgmap never sees the duplicate. Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-06-14v2writable: rename {partitions} field to {shards}
Our internal data structure should be consistent with Xapian terminology.
2019-05-15www: use Inbox->over where appropriate
We don't need to rely on Xapian search functionality for the majority of the WWW code, even. subject_normalized is moved to SearchMsg, where it (probably) makes more sense, anyways.
2019-05-14tests: remove unnecessary loading of ::DS and Socket
PublicInbox::DS works for every platform we we care about, nowadays; so checking for it is a waste of time. Cleanup a few POSIX and Socket imports while we're in the area.
2019-05-14v2writable: allow setting nproc via creat options
Avoiding reliance on environment variables is a bit cleaner for writing tests
2019-05-08Merge remote-tracking branch 'origin/danga-bundle'
* origin/danga-bundle: DS: epoll: fix misordered EPOLL_CTL_DEL call DS: drop unused "_undef" sub syscall: drop readahead wrapper build: do not manify DS and Syscall pods DS: handle EINTR in IO::Poll path, too DS: workaround IO::Kqueue EINTR (mis-)handling DS: drop profiling support DS: remove unused fields and functions listener: use EPOLLEXCLUSIVE for listen sockets bundle Danga::Socket and Sys::Syscall
2019-05-06index: warn with info about the message as context
This can help users track down the source of warnings when presented with imperfect emails. While we're at it, make the __WARN__ callback in t/v2writable.t a no-op since we don't check for warnings, there.
2019-05-04bundle Danga::Socket and Sys::Syscall
These modules are unmaintained upstream at the moment, but I'll be able to help with the intended maintainer once/if CPAN ownership is transferred. OTOH, we've been waiting for that transfer for several years, now... Changes I intend to make: * EPOLLEXCLUSIVE for Linux * remove unused fields wasting memory * kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615 * accept4 support And some lower priority experiments: * switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes) * nginx-style buffering to tmpfile instead of string array * sendfile off tmpfile buffers * io_uring maybe?
2019-01-11v2writable: ->purge returns undef on no-op
And doesn't try to access undef as an array ref.
2019-01-10t/v2writable.t: force more consistent "git log" output
This should probably use lower-level git plumbing, but until then, consistently add a bunch of --no-* options to "git log" to get more consistent output. Noticed-by: Johannes Berg https://public-inbox.org/meta/1538164205.14416.76.camel@sipsolutions.net/
2019-01-10check git version requirements
This allows v1 tests to continue working on git 1.8.0 for now. This allows git 2.1.4 packaged with Debian 8 ("jessie") to run old tests, at least. I suppose it's safe to drop Debian 7 ("wheezy") due to our dependency on git 1.8.0 for "merge-base --is-ancestor". Writing V2 repositories requires git 2.6 for "get-mark" support, so mask out tests for older gits.
2018-12-29tests: consolidate process spawning code.
IPC::Run provides a nice simplification in several places; and we already use it (optionally) on a lot of tests. For the non-test code, we still rely on our vfork-capable Inline::C stuff since real-world server processes can get large enough to where vfork is an advantage. Maybe Perl5 can use CLONE_VFORK somehow, one day: https://rt.perl.org/Ticket/Display.html?id=128227 Ohg V'q engure cbeg choyvp-vaobk gb Ehol :C
2018-07-18t/search.t t/v2writable.t: Teach search tests to fail more cleanly.
Now that some of the indexes are optionals these tests might fail so teach them to fail more cleanly. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-05-30respect umask if core.sharedRepository is not set
This is consistent with git itself and the previous behavior was a result of misunderstanding of how git interprets this. And adjust tests slightly to match the new behavior. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> <38873789-ab42-65a1-20c9-12c30b171f4f@linuxfoundation.org>
2018-04-18ensure SQLite and Xapian files respect core.sharedRepository
We can't have files with permissions inconsistent with what's in git objects.
2018-04-18v2: generate better Message-IDs for duplicates
While hunting duplicates, I noticed a leading '-' in some Message-IDs as a result of RFC4648 encoding. While '-' seems allowed by RFC5322 and URL-friendly (RFC4648), they are uncommon and make using Message-IDs as arguments for command-line tools more difficult. So prefix them with a datestamp to at least give readers some sense of the age. And shorten the "localhost" hostname to "z" to save space.
2018-04-07store less data in the Xapian document
Since we only query the SQLite over DB for OVER/XOVER; do not need to waste space storing fields To/Cc/:bytes/:lines or the XNUM term. We only use From/Subject/References/Message-ID/:blob in various places of the PSGI code. For reindexing, we will take advantage of docid stability in "xapian-compact --no-renumber" to ensure duplicates do not show up in search results. Since the PSGI interface is the only consumer of Xapian at the moment, it has no need to search based on NNTP article number.
2018-04-07v2writable: reduce barriers
Since we handle the overview info synchronously, we only need barriers in tests, now. We will use asynchronous checkpoints to sync less-important Xapian data. For data deduplication, this requires us to hoist out the cat-blob support in ::Import for reading uncommitted data in git.