about summary refs log tree commit homepage
path: root/lib/PublicInbox/Config.pm
DateCommit message (Collapse)
2024-03-08dedupe inbox names, coderepo nicks + git dirs
Inbox names, coderepo nicks, git_dir values are used heavily as hash keys by the read-only coderepo WWW pieces. Relying on CoW for mutable scalars on newer Perl doesn't work well since CoW for those scalars are limited to 256 CoW references and blow past that number when mapping thousands of coderepos and inboxes to each other. Instead, make the hash key up-front and get the resulting string to point directly to the pointer used by the hash key.
2024-02-14www: cgit: support non-standard cgitrc locations
If publicinbox.cgitrc is set in the config file, we'll ensure cgit sees it as CGIT_CONFIG since the configured publicinbox.cgitrc knob may not be the default path the cgit.cgi binary was configured to use. Furthermore, we'll respect CGIT_CONFIG in the environment if publicinbox.cgitrc is unset in the config file at -httpd/-netd startup.
2024-01-17config: glob2re: fix over-matching /**/foo
Noticed while adding wildcard support to WwwCoderepo...
2024-01-17config: don't vivify invalid fields for coderepos
We don't need 404s for non-existent coderepos creating fake (and invalid) entries. I noticed this while working on subsequent changes to support globbing in URLs.
2023-11-30config: reject newlines consistently in dir names
Explicitly drop support for "\n" in git coderepo pathnames as we do other stuff. Gcf2 (our libgit2 helper) was always broken with "\n" in pathnames, and I'm not sure if cgit config files work with them, either. Dealing with newline characters requires extra complexity that I'm not willing to deal with when managing alternates files.
2023-11-29www: load and use cindex join data
This is a major step in solving the problem of having to manually associate hundreds/thousands of coderepos with hundreds/thousands of public-inboxes to power solver (and more).
2023-11-14config: avoid eidx_key and newsgroup conflicts
Start lowercasing newsgroup names automatically since uppercase names are incompatible with IMAP and POP3 and also causes problems with both -extindex and -cindex. We'll also warn on eidx_key and newsgroup conflicts to avoid sometimes subtle breakage when using -extindex and -cindex.
2023-11-03treewide: use ->close to call ProcessIO->CLOSE
This will open the door for us to drop `tie' usage from ProcessIO completely in favor of OO method dispatch. While OO method dispatches (e.g. `$fh->close') are slower than normal subroutine calls, it hardly matters in this case since process teardown is a fairly rare operation and we continue to use `close($fh)' for Maildir writes.
2023-10-28treewide: use run_qx where appropriate
This saves us some code, and is a small step towards getting ProcessIO working with stat, fcntl and other perlops that don't work with tied handles.
2023-10-25limiter: split out from qspawn
It's slightly better organized this way, especially since `publicinboxLimiter' has its own user-facing config section and knobs. I may use it in LeiMirror and CodeSearchIdx for process management.
2023-10-03config: fix key-only truthy values with urlmatch
When using --get-urlmatch, we need a way to distinguish between between key-only or a `key=val' pair even if the `val' is empty. In other words, git interprets `-c imap.debug' as true and `-c imap.debug=' as false, but an untyped --get-urlmatch invocation has no way to distinguish between them. So we must specify we want `--bool' (we're avoiding `--type=bool' since that only appears in git 2.18+) Fixes: f170d220f876 (lei: fix `-c NAME=VALUE' config support)
2023-09-24config: drop scalar ref support from internal API
It's a needless branch to maintain exclusively for our tests. The `git config -l' output isn't pleasant to write in tests, anyways. So just use heredocs to write git configs in their native format rather than emulate the output of `git config -l'. This does make the test suite do more work with temporary files and process invocations, but it doesn't seem very measurable when testing on tmpfs (TMPDIR=/dev/shm). We'll make a minor improvement to TestCommon::tmpdir by allowing it to return a single value (which I suspect we can rely on in more places since File::Temp::Dir overloads stringification).
2023-09-24lei: fix `-c NAME=VALUE' config support
We can pass `-c NAME=VALUE' args directly to git-config without needing a temporary directory nor file. Furthermore, this opens the door to us being able to correctly handle `-c NAME=VALUE' after `delete $lei->{cfg}' if we need to reload the config during a command. This tightens up error-checking for `lei config' and ensures we can make config settings changes while using `-c NAME=VALUE' instead of editing the temporary file. The non-obvious part was avoiding the use of the -f/--file arg for `git config' for read-only operations and include relying on `-c include.path=$ABS_PATH'. This is done by parsing the switches to be passed to `git config' to determine if it's a read-only operation or not.
2023-09-24config: handle key-only entries as booleans
It's how git-config works, so our `git config --list' parser must be able to handle it. Fortunately this doesn't seem to incur a measurable overhead when parsing a config with 50k inboxes.
2023-08-24cindex: read-only association dump
This will eventually allow associating coderepos with inboxes and vice-versa; avoiding the need for manual configuration via tedious publicinbox.*.coderepo directives. I'm not sure how this should be stored for WWW, yet, but it's required since it takes about 8 hours to do this fully across lore and git.kernel.org.
2023-04-18www_coderepo: rescan cgit project-list for new coderepos
Coderepo changes are probably more common than inbox changes, so it probably makes sense to rescan and look for new coderepos on 404s, especially since we serve mirrored manifest.js.gz as-is. I noticed my git.kernel.org mirror was serving manifest.js.gz pointing to irretrievable repositories. This should stop that. We'll also drop the underscore ('_') and use `coderepo' everywhere to be consistent with our documentation. We may serve new inboxes in a similar way down the line, too; but this change only affects coderepos for now since we can guarantee the inbox manifest.js.gz never contains irretrievable inboxes as it's dynamically generated.
2023-04-07config: describe order dependency in cgitrc parsing
I just had to do a double-take here and look back at cgit.c to see the ordering dependency wasn't a bug I introduced, but mimicking what cgit does.
2023-03-25admin: ensure resolved GIT_DIR is absolute
We'll also support the $base arg of File::Spec->rel2abs since it should make codesearch indexing easier.
2023-03-18config: glob2re supports `**' to match multiple path components
This should match behavior documented in gitglossary(7)
2023-03-18treewide: move glob2re to PublicInbox::Config
It seems suitable for the config class since globs are a config/option thing.
2023-01-11config: use inbox names to map inboxes <-> coderepos
We can avoid having to deal with weakening references and then later creating strong references in WwwCoderepo.
2023-01-08config: do not implicitly set coderepo.*.cgiturl
It's a needless waste of memory and this change reduces the WwwCoderepo object size by over 25% with over 1K repos. Using the following check: perl -MDevel::Size=total_size -I lib -MPublicInbox::WwwCoderepo -E \ 'say total_size(PublicInbox::WwwCoderepo->new(PublicInbox::Config->new))' before: 1612515 after: 1184385
2023-01-01www: load cgitrc for coderepos for solver
Loading cgitrc (and associated projects.list) can get users out of defining as many individual coderepos. xt/solver.t needs a use of `$_' replaced since that gets clobbered while parsing cgitrc.
2022-11-23lei_curl: use http.proxy config from git if available
Since HTTP(S) URLs hit by lei or public-inbox-{clone,fetch} are expected to be git endpoints anyways, fall back to using http.proxy from git configs to save the user from having to maintain the same configuration for different things.
2022-11-23config: urlmatch $? does not influence our exits
We don't want to leak $? from `git config' failures into lei nor public-inbox-* processes.
2022-10-09www_coderepo: allow searching one extindex|inbox
I'm not sure how to best make a UI for one coderepo to many inboxes/extindices, yet; but at least allow a simple 1:1 mapping, for now. This ensures /$CODEREPO/$OID/s/ can work as effectively as /$INBOX/$OID/s/ when looking for emails associated with a git commit.
2022-10-09www_coderepo: wire up snapshots from summary
This also ensures we won't waste CPU cycles on snapshots which aren't configured if somebody attempts them by guessing URLs.
2022-10-09config: remove {-cgitrc_unparsed} field
This field has been unneeded since commit 6890430df808 (cgit: fix fallout from lazy coderepo loading, 2021-03-18)
2022-10-05www_coderepo: an alternative to cgit
This will allow it to easily map a single coderepo to multiple inboxes (or multiple coderepos to any number of inboxes). For now, this is just a summary, but $REPO/$OID/s/ support will be added, along with archive downloads. Indexing of coderepos will probably be supported via -extindex, only.
2022-08-23config: fix confusing space in ->repo_objs
It's actually valid Perl to have "$foo ->{field} = ..." but it's confusing and I noticed it while tracking down a configuration error.
2022-08-06daemon: dedupe PublicInbox::Config objects by pathname
This means all Inbox, Git, Over, Msgmap, Search objects also get deduplicated if they belong to the same config file, reducing memory and FD usage. This helps save memory and improve cache hit rates in -netd setups where NNTP, IMAP, HTTP, and POP3 servers run in the same process. InboxIdle was the only bit which needed adjustment, but there may be other bugs lurking despite all tests passing.
2022-07-20public-inbox-pop3d - a mostly read-only POP3 server
Old account expiry has not been implemented, but it seems to work well with both mpop(1) and getmail(1). The strictness of mpop was particularly helpful in ironing out bugs in our implementation of (dreaded) message sequence numbers. "EXPIRE 0" (RFC 2449) can theoretically save numerous "DELE" commands, but that's untested by real-world clients. mpop supports PIPELINING which is effective in hiding latency, and the core networking functionality is already well-tested from our NNTP and IMAP implementations. Configuration requires "publicinbox.pop3state" to point to a directory writable by the otherwise read-only daemon. See public-inbox-pop3d(1) manpage for more usage details.
2021-11-01treewide: kill problematic "$h->{k} //= do {" assignments
As stated in the previous change, conditional hash assignments which trigger other hash assignments seem problematic, at times. So replace: $h->{k} //= do { $h->{x} = ...; $val }; $h->{k} // do { $h->{x} = ...; $hk->{k} = $val }; "||=" is affected the same way, and some instances of "||=" are replaced with "//=" or "// do {", now.
2021-10-23www: respect coderepo.*.url during cgit init
This is necessary for showing "found $OID in $CODEREPO_URL" in solver-generated pages ($INBOX_URL/$OID/s/).
2021-10-23config: remove *_url_format support for cgit
We're not using them, anywhere.
2021-09-27config: get_1: use full parameter name
Instead of passing the prefix section and key separately, pass them together as is commonly done with git-config(1) usage as well as our ->get_all API. This inconsistency in the get_1 API is a needless footgun and confused me a bit while working on "lei up" the other week.
2021-09-16www: support publicinbox.imapserver
This allows PublicInbox::WWW hosts to advertise the existence of IMAP servers in addition to NNTP servers.
2021-09-16inbox: streamline ->nntp_url
We no longer waste a precious hash slot for a per-Inbox {nntpserver} if it's only configured globally for all inboxes.
2021-09-11lei: pass client stderr to git-config in more places
This should improve the users' chances of seeing errors in various git config files we use.
2021-09-11lei: fix handling of broken lei.saved-search config files
lei shouldn't become unusable if a config file is invalid. Instead, show the "git config" stderr and attempt to continue gracefully. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210910141157.6u5adehpx7wftkor@meerkat.local/
2021-08-28config: do not parse altid for extindex
There's currently no support for altid with extindex, and there's likely no legacy precedent for using altid like there is with single public-inboxes.
2021-07-22extsearch: support publicinbox.*.boost parameter
This behaves identically the lei external "boost" parameter in prioritizing raw messages for extindex. Relying exclusively on the config file order doesn't work well for mirrors since it's impossible to guarantee config file ordering via grokmirror hooks. Config file ordering remains the default if boost is unconfigured, or in case of ties. Note: I chose the name "boost" rather than "priority" or "rank" since I always get confused by whether higher or lower numbers take precedence when it comes to kernel scheduling. "weight" is also a part of Xapian API terminology, which we currently do not expose to configuration (but may in the future).
2021-07-18config: s/_one_val/get_1/ for public use
We'll be using this in lei for watch configs.
2021-06-23www_listing: start updating for pagination + search
When dealing with thousands of inboxes, displaying all of them on a single page isn't going to work. So steal some pagination and search results code from the message search to generate some basic HTML output that looks good in w3m.
2021-04-30lei_curl: improve correctness of LD_PRELOAD check
LD_PRELOAD sent by a client can't affect lei-daemon.
2021-04-19config: git_config_dump blesses
I don't know if it's worth it to sub (or super)class PublicInbox::Config into something more generic for lei, but this change simplifies a good chunk of lei code that reuses the public-inbox config parsing.
2021-04-17lei q --save: clobber config file on repeats
A user may wish to clobber/refine existing search parameters by issuing "lei q --save" again. Support that by overwriting the lei.saved-search state file entirely. We continue to preserve over.sqlite3 for deduplication purposes. This way, we don't get something redundant like: [lei] q = term1 q = term2 q = term1 q = term2 q = term3 ...whenever a user wants to refine their search. Instead, we'll just have: [lei] q = term1 q = term2 q = term3 On the second go.
2021-03-19config: ignore extindex entries with newlines in paths
git 2.11 and earlier could not handle git directories with newlines in them, nor does libgit2 support them. Followup-to: d87dd0e679587043 ("config: reject `\n' in `inboxdir'")
2021-03-19cgit: fix fallout from lazy coderepo loading
We can't completely instantiate our cgit wrapper without knowing knowing cgit locations for serving static content. Fixes: a5968dc059f655a ("config: lazy-load coderepos, support extindex")
2021-03-17config: lazy-load coderepos, support extindex
Extsearch objects are duck-types of Inbox objects, and are capable of supporting code repos all the same.