about summary refs log tree commit homepage
path: root/script
DateCommit message (Collapse)
2020-12-26index: filter out indexlevel=basic from extindex
extindex users will likely want to use indexlevel=basic for per-inbox indices, however extindex itself doesn't support basic index level (yet?). Let's ensure we don't trip up extindex users who specify "-L basic" on the -index command-line.
2020-12-26index: fix --no-fsync flag propagation to extindex
Negation in flag names are confusing, but trying to deviate from the DB_NO_SYNC name used by Xapian is also confusing.
2020-12-26index: do not attach inbox to extindex unless updated
We'll count the number of log changes (regardless of index or unindex) and only attach inboxes to ExtSearchIdx objects when they get new work. We'll also reduce lock bouncing and only update external indices after all per-inbox indexing is done. This also updates existing v2 indexing/unindexing callers to be more consistent and ensures unindex log entries update per-inbox last commit information.
2020-12-26index: disable --fast-noop on --reindex
These options make no sense when used together, just inform the user and move on since it's probably harmless to continue.
2020-12-26init: use the return value of rel2abs_collapsed
:x Fixes: 9fcce78e40b0a7c6 ("script/public-inbox-*: favor caller-provided pathnames")
2020-12-25index: support --fast-noop / -F switch
Note: I'm not sure if it's worth documenting and supporting this long-term. We can can avoid taking locks for invocations of "index --all" and rely on high-resolution ctime (struct timespec st_ctim) comparisons of msgmap.sqlite3 and the packed-refs + refs/heads directory of the newest epoch. This cuts public-inbox-index invocations with "--all --no-update-extindex -L basic" down from 0.92s to 0.31s. The change with "-L medium" or "-L full" and (default) non-zero jobs is even more drastic, reducing a 12-13s no-op invocation down to the same 0.31s
2020-12-25inboxwritable: delay umask_prepare calls
This simplifies all ->with_umask callers and opens the door for further optimizations to delay/elide process spawning.
2020-12-24index: update [extindex "all"] by default, support -E
In most cases, this ensures users will only have to opt-in to using -extindex once and won't have to issue extra commands to keep external indices up-to-date when using public-inbox-index. Since we support arbitrary numbers of external indices for ease-of-development, we'll support repeating "-E" ("--update-extindex=") in case users want to test changes in parallel.
2020-12-21use rel2abs_collapsed when loading Inbox objects
We need to canonicalize paths for inboxes which do not have a newsgroup defined, otherwise ->eidx_key matches can fail in unexpected ways.
2020-12-20script/public-inbox-*: favor caller-provided pathnames
We'll try to avoid calling Cwd::abs_path and use File::Spec->rel2abs instead, since abs_path will resolve symlinks the user specified on the command-line. Unfortunately, ->rel2abs still leaves "/.." and "/../" uncollapsed, so we still need to fall back to Cwd::abs_path in those cases. While we are at it, we'll also resolve inboxdir from deep inside v2 directories instead of misdetecting them as v1 bare git repos. In any case, stop matching directories by name and instead rely on the unique combination of st_dev + st_ino on stat() as we started doing in the extindex code.
2020-12-09extindex: do not use current dir like -index does
At least not for resolving inboxes, since there's no good way for a user to specify what is an inbox or extindex directory without a command-line switch. Instead of changing the -extindex command, we change the -index command internals to rely on the new {-use_cwd} flag to avoid internal use of negation, since double-negatives and the like are confusing to me.
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-11-29extindex: support `--gc' to remove dead inboxes
Inboxes may be removed or newsgroups renamed over time. Introduce a switch to do garbage collection and eliminate stale search and xref3 results based on inboxes which remain in the config file. This may also fixup stale results leftover from any bugs which may leave stale data around. This is also useful in case a clumsy BOFH (me :P) is swapping between several PI_CONFIGs and accidentally indexed a bunch of inboxes they didn't intend to.
2020-11-28*index: more consistent graceful shutdown checks
v1 and v2 inbox indexing now supports graceful shutdown checks just like ExtSearchIdx. Additionally, we'll consistently perform quit checks at the top of loops for consistency. Interaction with the --xapian-only and --sequential-shard options are a bit lacking, and will warn the user to use "--reindex --xapian-only" to fix.
2020-11-24miscidx: put grokmirror manifest entries in Xapian docdata
This should make it possible for us quickly generate manifest.js.gz files with less random I/O and process spawning in the WWW code.
2020-11-19extindex: remove skip-docdata option
Since extindex is entirely new, it doesn't have backwards compatibility concerns and never stored docdata, anyways.
2020-11-08extindex: fix --batch-size support
Calling PublicInbox::Admin::index_prepare is required for --batch-size (k|m|g) modifiiers and indexBatchSize in the config file. Otherwise, the default 1m batch size stuck and led to unexpectedly bad performance on a machine which could index v2 inboxes faster with larger batch sizes.
2020-11-08extindex: SIGUSR1 supports checkpoint
Matching the behavior of git-fast-import(1), we'll allow a user to send SIGUSR1 to checkpoint over.sqlite3 and Xapian.
2020-11-08v2writable: more accurate {current_info} warnings/progress
With async git blob retrievals, the OID being enqueued and the OID being processed can be totally unrelated and misleading. We'll also prefix $INBOX_DIR for v2, and not just the epoch since we could be indexing multiple inboxes via both -index and -extindex.
2020-11-08extsearch: rename -eindex to -extindex
Upon "eindex" rhymes with "reindex", which could be confusing; so name the command and config prefix to use "extindex" which is hopefully less confusing.
2020-11-07index: eindex wiring
This doesn't do anything, yet, but it will once the rest of the eindex stuff works.
2020-11-07script: add preliminary eindex implementation
Not documented, yet, but it runs...
2020-09-19gcf2: wire up read-only daemons and rm -gcf2 script
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes it easier to avoid mismatched PERL5LIB include paths (as noticed during development :x). We'll also make the existing cat-file process management infrastructure more resilient to BOFHs on process killing sprees (or in case our libgit2-based code fails on us). (Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd won't automatically benefit from this change, and extra configuration will be required (to be documented later).
2020-09-19gcf2: require git dir with OID
This amortizes the cost of recreating PublicInbox::Gcf2 objects when alternates change in v2 all.git.
2020-09-19gcf2: transparently retry on missing OID
Since we only get OIDs from trusted local data sources (over.sqlite3), we can safely retry within the -gcf2 process without worry about clients spamming us with requests for invalid OIDs and triggering reopens.
2020-09-19add gcf2 client and executable script
This should be able to replace multiple `git cat-file' for blob retrieval, but adjustments may be needed.
2020-09-14sigfd: fix typos and scoping on systems w/o epoll+kqueue
Unfortunately, I'm not sure how easy catching these at compile-time, is. Prototypes do not seem to check these at compile time when crossing packages (not even with exported subroutines).
2020-09-02init+convert: create non-existing directory hierarchies
Following "git init" as an example, we'll create every parent path up to the one specified, instead of attempting to continue on when Cwd::abs_path returns `undef'.
2020-09-02watch: add --help/-h support
And avoid unnecessary POD markup in the man page.
2020-09-02mda+learn: add --help / -h support
"use Getopt::Long" doesn't seem too slow on a hot page cache, and it's probably used frequently enough to be in cache. We'll also start reducing the amount of markup in the .pod and favoring verbatim text in documentation for readability in source form, since the bold text seems excessive.
2020-09-02script/*: fold $usage into $help, support `-h' instead of -?
`-h' doesn't conflict with anything, and some users (including git users) may be more accustomed to using it rather than the rarely-seen-outside-of-Getopt::Long `-?' switch. We can also rely on the GetOptions() function to emit a proper error message instead of just "bad command-line args".
2020-09-02edit+purge: support `--help' and `-h' like other commands
And while we're at it, note edit is *destructive* to encourage reading the fine manual.
2020-09-02script/*: set executable bit on -learn and -imapd
It's useful to mark they're meant to be executable, even if the shebang is useless.
2020-09-02index: check for xapian-compact when using --compact
Otherwise, users may be frustrated to discover it missing a long indexing run.
2020-09-01watch: log signal activities to STDERR
Sometimes it may not be apparent when/if a signal is processed, this hopefully improves the situation. We'll also change the process title when we're quitting to better inform users.
2020-09-01rename WatchMaildir => Watch
This is no longer limited to Maildirs now that IMAP and NNTP support exist; so give it a shorter name.
2020-08-20init+index: support --skip-docdata for Xapian
Since we no longer read document data from Xapian, allow users to opt-out of storing it. This breaks compatibility with previous releases of public-inbox, but gives us a ~1.5% space savings on Xapian storage (and associated I/O and page cache pressure reduction).
2020-08-20init: drop -N alias for --skip-artnum
It may be too easily confused for --newsgroup or --ng. This is too rarely used and never made it into a release, so it should be fine.
2020-08-20init: support --newsgroup option
We can reduce the need to edit the config file for NNTP group names this way.
2020-08-20init: support --help and -?
And speed those up with some lazy loading, too.
2020-08-20compact: support --help/-? and perform lazy loading
This probably won't be used much, but --help can still make sense.
2020-08-14index|compact|xcpdb: support --all switch
For -index, this is a convenient way to quickly index all inboxes after a grok-pull. Might as well support it for rarely used commands like -compact and -xcpdb, too.
2020-08-13xcpdb: wire up new index options and --help
--sequential-shard also disables the copy parallelism (--jobs), so it can be useful for systems unable to handle parallel random I/O but still want many shards. There was a missing "use strict", too, which is fixed.
2020-08-13xcpdb: support --no-fsync from CLI
This was omitted in 8b1950055d51d436 :x Fixes: 8b1950055d51d436 ("index+xcpdb: rename `--no-sync' to `--no-fsync'")
2020-08-10convert: set No_COW on copied SQLite files
We'll use our existing logic and use sqlite_backup_from_file, which appeared in 1.39 (along with sqlite_backup_to_file).
2020-08-10convert: check ARGV more correctly
Instead of silently ignoring excessive args, don't let a user specify an extra directory. Furthermore, we'll support the odd case where BOFH wants to name an $INBOX_DIR to be `0' :P
2020-08-10convert: speed up --help
Lazy-loading dependencies speeds up --help by several hundred milliseconds and is a huge step towards user-friendliness.
2020-08-10convert: support new -index options
Converting v1 inboxes from v2 can be a painful experience on HDD. Some of the new options in the CLI or config file make it less painful.
2020-08-10index: cleanup internal variables
Move away from hard-to-read alllowercase naming and favor snake_case or separated-by-dashes. We'll keep `--indexlevel' as-is for now, since it's been around for several releases; but we'll support `--index-level' in the CLI and update our documentation in a few months. We'll also clarify that publicInbox.indexMaxSize is only intended for -index, and not -watch or -mda.
2020-08-10avoid File::Temp::tempfile in more places
We can use open(..., undef) natively in Perl in t/import.t In places where we need a pathname, the File::Temp OO API gives us auto-unlinking for free.