about summary refs log tree commit homepage
path: root/t
DateCommit message (Collapse)
2021-10-13t/lei-mirror: avoid reading ~/.public-inbox/config in test
Oops, we shouldn't attempt to read a users' actual HOME when running -index, since mine has a bunch of invalid entries in there.
2021-10-13t/www_listing: require opt-in for grokmirror tests
grokmirror 2.x seems to idle in several places for 5s at-a-time, causing t/www_listing.t to take longer than "make check-run" on a 4-core system when run without grokmirror. So make it optional but add some test knobs to allow tailing the log output so I can see what's going on.
2021-10-12www: _/text/config/raw Last-Modified: is mm->created_at
This allows IMAP mirrors to keep UIDVALIDITY synchronized (and "LIST ACTIVE.TIMES" in NNTP). "lei add-external --mirror" will automatically set it, as will the combination of public-inbox-clone + public-inbox-index. This avoids the need for extra endpoints or config entries, at least...
2021-10-12msgmap: ->new_file to supports $ibx arg, drop ->new
The original Msgmap->new API was v1-specific and not necessary. The ->new_file API now supports an $ibx object being passed to it, simplify -no_fsync use. It will also make an upcoming change easier...
2021-10-12daemon: unconditionally close Xapian shards on cleanup
The cost of opening a Xapian DB (even with shards) isn't high, so save some FDs and just close it. We hit Xapian far less than over.sqlite3 and we discard the MSet ASAP even when streaming large responses. This simplifies our code a bit and hopefully helps reduce fragmentation by increasing mortality of late allocations.
2021-10-12search: delete QueryParser along with DB handle
Xapian::QueryParser is attached to the Xapian::Database, so holding onto the QueryParser was preventing us from releasing DB handles if a query was performed.
2021-10-12extindex: avoid invalid blobs after unref
When unref-ing a blob from xref3, make sure the "preferred" smsg->{blob} doesn't point to the blob we just unrefed. This is necessary because we periodically checkpoint our extindex process to allow -watch and -mda processes to run. This also gets rid of a lot of redundant code for ->remove_xref3, since it's all handled in ExtSearchIdx, now.
2021-10-09extindex: support --reindex --fast
This mode only checks history for missed/stale messages and doesn't attempt to reindex messages which are already indexed.
2021-10-08git: fatalize async callback errors by default
This should help us catch BUG: errors (and then some) in -extindex and other read-write code paths. Only read-only daemons should warn on async callback failures, since those aren't capable of causing data loss.
2021-10-08git: use async_wait_all everywhere
Some code paths may use maximum size checks, so ensure any checks are waited on, too.
2021-10-05index: --reindex w/ --{since,until,before,after}
This lets administrators reindex specific time ranges according to git "approxidate" formats. These arguments are passed directly to underlying git-log(1) invocations and may still reach into old epochs. Since these options rely on git committer dates (which we infer from the most recent Received: header), they are not guaranteed to be strictly tied to git history and it's possible to over/under-reindex some messages. It's probably not a major problem in practice, though; reindexing a few extra messages is generally harmless aside from some extra device wear. Since this currently relies on git-log, these options do not affect -extindex, yet.
2021-10-04hl_mod: don't memoize highlight::codeGenerator objects
Making them immortal doesn't seem worth it, since doing immortal allocations after process startup leads to fragmentation. While the allocations made by highlight are small, those small allocations can break up contiguous regions and prevent consolidation by the malloc implementation. Since instantiating code generators doesn't seem too expensive, just use and delete them ASAP.
2021-10-04www: fix ref cycle from threading w/ extindex
Unlike v1 inboxes (which don't accept duplicate Message-IDs at all), and v2 inboxes (which generate a new Message-ID for duplicates), extindex must accept duplicate Message-IDs as-is. This was fine for storage, but prevented the reference-cycle mechanism of our message threading display algorithm from working reliably. It could no longer delete the ->{parent} field from clobbered entries in the %id_table. So we now take into account reused Message-IDs and never clobber entries in %id_table. Instead, we mark reused Message-IDs as "imposters" and special-case them by injecting them as children after all other threading is complete. This cycle was noticed using a pre-release of Devel::Mwrap::PSGI: https://80x24.org/mwrap-perl.git
2021-10-04t/thread-cycle: make Email::Simple optional
We only use it if Mail::Thread is available, and often it's not.
2021-10-02lei inspect: fix "mid:" prefix, expand to Xapian
This fixes inspect for uninitialized instances, and adds Xapian ("xdoc") output if available. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Message-ID: <20211001204943.l4yl6xvc45c5eapz@meerkat.local>
2021-10-01ds: simplify signalfd use
Since signalfd is often combined with our event loop, give it a convenient API and reduce the code duplication required to use it. EventLoop is replaced with ::event_loop to allow consistent parameter passing and avoid needlessly passing the package name on stack. We also avoid exporting SFD_NONBLOCK since it's the only flag we support. There's no sense in having the memory overhead of a constant function when it's in cold code.
2021-10-01daemon: make SO_ACCEPTFILTER a shared variable
Constant subroutines use more memory and there's no need to optimize it for inlining since it's only used at startup.
2021-09-29t/solver_git: fix test to work with git <2.29
'git diff --abbrev=40' did not abbreviate /^index / lines of diff output with git <2.29, and 40 will be insufficient for SHA-256. --full-index has been around since 2005, so it's safe to rely on. Tested git version 2.20.0 (Debian buster). Fixes: 751df49e7db8ba77 ("lei rediff: add --drq and --dequote-only")
2021-09-29inbox: drop memoization/preload, cleanup expires caches
cloneurl, description, and base_url are no longer memoized. The non-$env form of base_url is rare in WWW, and is fast enough to not require memoization. cloneurl and description are now expired during cleanup, allowing admins to change these files without restarting (or SIGHUP). -altid_map is no longer cached nor memoized at all, since the endpoint(s) which hit it seem rarely accessed. nntp_url and imap_url are now cached (instead of memoized) in case an inbox is unvisited for a long time. They remain cached since the truthiness check gets called in every per-inbox HTML page, which can potentially be expensive.
2021-09-27t/lei-index: IMAP and NNTP dependencies are optional
"lei index" support for IMAP and NNTP is incomplete, so there's no point in requiring them. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27fetch: support running as root
The "-w" perlop always succeeds as root, so we need to check st_mode for writability bits to detect directories we shouldn't write to. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27t/cmd_ipc: allow extra errors and add diagnostics
Apparently, sendmsg can fail in less common ways when network buffers are gigantic. Add some diagnostics for future failures, as well. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27lei rediff: add --drq and --dequote-only
More switches which can be useful for users who pipe from text editors. --drq can be helpful while writing patch review email replies, and perhaps --dequote-only, too.
2021-09-27lei rediff: quiet warnings from Import and Eml
lei rediff is expected to see partial patch fragments and such, so silence warnings when something isn't exactly a valid email message.
2021-09-26t/run.perl: less confusing error reporting
The $sigchld handler was reporting the last test (successful or not) for a given PID in case a worker dies prematurely. Instead, redisplay all failed test in $run_log to ensure the report only shows failed tests, and not the last started (and possibly successful) one.
2021-09-25t/v2mirror: check dependencies for legacy test
We still need Email::MIME to test against old revisions. We'll also depend on the revision just prior to the manifest.js.gz introduction to avoid loading Danga::Socket, since it was getting loaded even with `plackup'. Finally, we'll disable Inline::C usage with old Spawn.pm since our old code included alloca.h, which is not portable to FreeBSD.
2021-09-24fetch: support v2 w/o manifest on old WWW
There may still be pre-manifest.js.gz versions of PublicInbox::WWW running and serving v2 inboxes. While -clone and "add-external --mirror" were working, -fetch was failing due to 301 redirect to $INBOX_URL/manifest.js.gz/ and not the expected 404. Update the code to deal with a JSON decode error (from the 301) and ensure v2 epochs detection is correct (and not using a shadowed variable).
2021-09-24clone|fetch|--mirror: cull manifest in partial mirrors
This makes it easier for users to enable fetching on a previously read-only epoch. Prior to this change, users were required to delete manifest.js.gz in addition to adding the writable bit. Now, they just have to "chmod +w $EPOCH_DIR".
2021-09-24clone|--mirror: fix and test against pre-manifest WWW
There may still be pre-manifest.js.gz versions of PublicInbox::WWW. running and serving v2 inboxes. Since $INBOX_URL/manifest.js.gz was not understood, it was assumed to be a Message-ID and 301-ed to "$INBOX_URL/manifest.js.gz/" with a trailing slash, so our 404 checks were invalid. Update our fallbacks to deal with 301 by catching JSON decoding errors to trigger HTML scraping. For HTML parsing, be sure to not be fooled by potential user-generated content and only scan the part after the last <hr>. We also need to avoid propagating $? from curl unnecessarily when we can continue safely. Finally, update v2mirror.t with tests to use PublicInbox::WWW from our "v1.1.0-pre1" tag to ensure these code paths get tested
2021-09-24clone|--mirror: support --epoch=RANGE for partial clones
Partial (v2) clones should be useful addition for users wanting to conserve storage while having fast access to recent messages. Continuing work started in 876e74283ff3 (fetch: ignore non-writable epoch dirs, 2021-09-17), this creates bare, read-only epoch git repos. These git repos have the remotes pre-configured, but does not fetch any objects. The goal is to allow users to set the writable bit on a previously-skipped epoch and start fetching it. Shell completion support may not be necessary given how short the epoch ranges are, here. Cc: Luis Chamberlain <mcgrof@kernel.org> Link: https://public-inbox.org/meta/20210917002204.GA13112@dcvr/T/#u
2021-09-23xcpdb: -R$SHARDS creates new shards with correct perms
"Correct" meaning the permissions match that of the parent xap15 or ei15 directory.
2021-09-23daemons: revamp periodic cleanup task
Neither Inboxes nor ExtSearch objects were retrying correctly when there are live git processes, but the inboxes were getting rescanned for search or other reasons. Ensure the scan retries eventually if there's live processes. We also need to update the cleanup task to detect Xapian shard count changes, since Xapian ->reopen is enough to detect any other Xapian changes. Otherwise, we just issue an inexpensive ->reopen call and let Xapian check whether there's anything worth reopening. This also lets us eliminate the Devel::Peek dependency.
2021-09-22treewide: fix %SIG localization, harder
This fixes the occasional t/lei-sigpipe.t infinite loop under "make check-run". Link: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> Followup-to: b552bb9150775fe4 ("daemon+watch: fix localization of %SIG for non-signalfd users")
2021-09-21t/lei-up: use '-q' to silence non-redirected test
We could redirect, too, but just use -q since we don't care for the output with run_mode => 0.
2021-09-21lei q: update messages to reflect --save default
I wanted to try --dedupe=none for something, but it failed since I forgot --no-save :x So hint users towards --no-save if necessary.
2021-09-21lei q: show progress on >1s preparation phase
Overwriting existing destinations safe (but slow) by default, so show a progress message noting what we're doing while a user waits.
2021-09-21lei lcat: support NNTP URLs
NNTP URLs are probably more prevalent in public message archives than IMAP URLs.
2021-09-21lei inspect: support NNTP URLs
No reason not to support them, since there's more public-inbox-nntpd instances than -imapd instances, currently.
2021-09-19net_reader: NNTP: remove article numbers from mail_sync folders
NNTP article numbers are stored separately from folder names in mail_sync.sqlite3. Recovering from this is optional, worse case is wasting bandwidth refetching some messages. To (optionally) recover from this, use: lei forget-mail-sync $URL_WITH_ARTNUMS Some articles will be refetched on the next import, but duplicate data won't be indexed in Xapian.
2021-09-19lei config --edit: use controlling terminal
As with "lei edit-search", "lei config --edit" may spawn an interactive editor which works best from the terminal running script/lei. So implement LeiConfig as a superclass of LeiEditSearch so the two commands can share the same verification hooks and retry logic.
2021-09-19net_reader: no STARTTLS for IMAP localhost or onions
At least not by default, to match existing NNTP behavior. Tor .onions are already encrypted, and there's no point in encrypting traffic on localhost outside of testing.
2021-09-19net_reader: fix single NNTP article fetch, test ranges
While NNTP ranges was already working, fetching a single message was broken. We'll also simplify the code a bit and ensure incremental synchronization is ignored when ranges are specified.
2021-09-19ipc: drop dynamic WQ process counts
In retrospect, I don't think it's needed; and trying to wire up a user interface for lei to manage process counts doesn't seem worthwhile. It could be resurrected for public-facing daemon use in the future, but that's what version control systems are for. This also lets us automatically avoid setting up broadcast sockets Followup-to: 7b7939d47b336fb7 ("lei: lock worker counts")
2021-09-19ipc: wq_do: support synchronous waits and responses
This brings the wq_* SOCK_SEQPACKET API functionality on par with the ipc_do (pipe-based) API.
2021-09-19t/lei-refresh-mail-sync: improve test reliability
We can't assume -imapd will be ready by the time we try to connect to it after restart when using "-l $ADDR". So recreate the (closed-for-testing) listen socket in the parent and hand it off to -imapd as we do normally
2021-09-19t/config: extra test for imap_url with imaps://
I configured this for public-inbox.org, but wasn't 100% sure it worked. This test ensures it stays working :>
2021-09-18lei up: automatically use dt: for remote externals
Since we can't use maxuid for remote externals, automatically maintaining the last time we got results and appending a dt: range to the query will prevent HTTP(S) responses from getting too big. We could be using "rt:", but no stable release of public-inbox supports it, yet, so we'll use dt:, instead. By default, there's a two day fudge factor to account for MTA downtime and delays; which is hopefully enough. The fudge factor may be changed per-invocation with the --remote-fudge-factor=INTERVAL option Since different externals can have different message transport routes, "lastresult" entries are stored on a per-external basis.
2021-09-17fetch: ignore non-writable epoch dirs
This will eventually be useful for maintaing partial mirrors. Keeping inline with the original public-inbox-fetch philosophy, there are no additional config files to manage: the user merely needs to remove write permissions to an $N.git directory to prevent it from being updated. Re-enabling updates just requires restoring write permission.
2021-09-17search: fix rt: w/ approxidate when TZ != UTC
While git respects a user's local timezone and returns seconds-since-the-Epoch, we were unnecessarily and incorrectly calling gmtime+strftime on its result. So ignore calling gmtime+strftime when the strftime format is "%s", just feed the output time from git directly to Xapian. This is mainly for lei, which will likely run in a variety of timezones. While we're at it, add a recommendation to use TZ=UTC in public-inbox-httpd, in case there are (misguided :P) sysadmins who set a non-UTC TZ.
2021-09-17lei refresh-mail-sync: drop old IMAP folder info
Like with Maildir, IMAP folders can be deleted entirely. Ensure they can be eliminated, but don't be fooled into removing them if they're temporarily unreachable.