Date | Commit message (Collapse) |
|
Inbox names, coderepo nicks, git_dir values are used heavily
as hash keys by the read-only coderepo WWW pieces.
Relying on CoW for mutable scalars on newer Perl doesn't work
well since CoW for those scalars are limited to 256 CoW references
and blow past that number when mapping thousands of coderepos
and inboxes to each other. Instead, make the hash key up-front
and get the resulting string to point directly to the pointer
used by the hash key.
|
|
If publicinbox.cgitrc is set in the config file, we'll ensure
cgit sees it as CGIT_CONFIG since the configured
publicinbox.cgitrc knob may not be the default path the cgit.cgi
binary was configured to use.
Furthermore, we'll respect CGIT_CONFIG in the environment if
publicinbox.cgitrc is unset in the config file at -httpd/-netd
startup.
|
|
Noticed while adding wildcard support to WwwCoderepo...
|
|
We don't need 404s for non-existent coderepos creating fake
(and invalid) entries. I noticed this while working on
subsequent changes to support globbing in URLs.
|
|
Explicitly drop support for "\n" in git coderepo pathnames as
we do other stuff. Gcf2 (our libgit2 helper) was always
broken with "\n" in pathnames, and I'm not sure if cgit config
files work with them, either. Dealing with newline characters
requires extra complexity that I'm not willing to deal with when
managing alternates files.
|
|
This is a major step in solving the problem of having to
manually associate hundreds/thousands of coderepos with
hundreds/thousands of public-inboxes to power solver
(and more).
|
|
Start lowercasing newsgroup names automatically since uppercase
names are incompatible with IMAP and POP3 and also causes
problems with both -extindex and -cindex.
We'll also warn on eidx_key and newsgroup conflicts to avoid
sometimes subtle breakage when using -extindex and -cindex.
|
|
This will open the door for us to drop `tie' usage from
ProcessIO completely in favor of OO method dispatch. While
OO method dispatches (e.g. `$fh->close') are slower than normal
subroutine calls, it hardly matters in this case since process
teardown is a fairly rare operation and we continue to use
`close($fh)' for Maildir writes.
|
|
This saves us some code, and is a small step towards getting
ProcessIO working with stat, fcntl and other perlops that don't
work with tied handles.
|
|
It's slightly better organized this way, especially since
`publicinboxLimiter' has its own user-facing config section
and knobs. I may use it in LeiMirror and CodeSearchIdx for
process management.
|
|
When using --get-urlmatch, we need a way to distinguish between
between key-only or a `key=val' pair even if the `val' is empty.
In other words, git interprets `-c imap.debug' as true and
`-c imap.debug=' as false, but an untyped --get-urlmatch
invocation has no way to distinguish between them.
So we must specify we want `--bool' (we're avoiding `--type=bool'
since that only appears in git 2.18+)
Fixes: f170d220f876 (lei: fix `-c NAME=VALUE' config support)
|
|
It's a needless branch to maintain exclusively for our tests.
The `git config -l' output isn't pleasant to write in tests,
anyways. So just use heredocs to write git configs in their
native format rather than emulate the output of `git config -l'.
This does make the test suite do more work with temporary files
and process invocations, but it doesn't seem very measurable
when testing on tmpfs (TMPDIR=/dev/shm).
We'll make a minor improvement to TestCommon::tmpdir by allowing
it to return a single value (which I suspect we can rely on in
more places since File::Temp::Dir overloads stringification).
|
|
We can pass `-c NAME=VALUE' args directly to git-config without
needing a temporary directory nor file. Furthermore, this opens
the door to us being able to correctly handle `-c NAME=VALUE'
after `delete $lei->{cfg}' if we need to reload the config
during a command.
This tightens up error-checking for `lei config' and ensures we
can make config settings changes while using `-c NAME=VALUE'
instead of editing the temporary file.
The non-obvious part was avoiding the use of the -f/--file arg for
`git config' for read-only operations and include relying on
`-c include.path=$ABS_PATH'. This is done by parsing the
switches to be passed to `git config' to determine if it's a
read-only operation or not.
|
|
It's how git-config works, so our `git config --list' parser
must be able to handle it. Fortunately this doesn't seem to
incur a measurable overhead when parsing a config with 50k
inboxes.
|
|
This will eventually allow associating coderepos with inboxes
and vice-versa; avoiding the need for manual configuration via
tedious publicinbox.*.coderepo directives.
I'm not sure how this should be stored for WWW, yet, but it's
required since it takes about 8 hours to do this fully across
lore and git.kernel.org.
|
|
Coderepo changes are probably more common than inbox changes, so
it probably makes sense to rescan and look for new coderepos on
404s, especially since we serve mirrored manifest.js.gz as-is.
I noticed my git.kernel.org mirror was serving manifest.js.gz
pointing to irretrievable repositories. This should stop that.
We'll also drop the underscore ('_') and use `coderepo'
everywhere to be consistent with our documentation.
We may serve new inboxes in a similar way down the line, too;
but this change only affects coderepos for now since we can
guarantee the inbox manifest.js.gz never contains irretrievable
inboxes as it's dynamically generated.
|
|
I just had to do a double-take here and look back at cgit.c
to see the ordering dependency wasn't a bug I introduced,
but mimicking what cgit does.
|
|
We'll also support the $base arg of File::Spec->rel2abs
since it should make codesearch indexing easier.
|
|
This should match behavior documented in gitglossary(7)
|
|
It seems suitable for the config class since globs are a
config/option thing.
|
|
We can avoid having to deal with weakening references and then
later creating strong references in WwwCoderepo.
|
|
It's a needless waste of memory and this change reduces the
WwwCoderepo object size by over 25% with over 1K repos.
Using the following check:
perl -MDevel::Size=total_size -I lib -MPublicInbox::WwwCoderepo -E \
'say total_size(PublicInbox::WwwCoderepo->new(PublicInbox::Config->new))'
before: 1612515
after: 1184385
|
|
Loading cgitrc (and associated projects.list) can get users
out of defining as many individual coderepos.
xt/solver.t needs a use of `$_' replaced since that
gets clobbered while parsing cgitrc.
|
|
Since HTTP(S) URLs hit by lei or public-inbox-{clone,fetch} are
expected to be git endpoints anyways, fall back to using
http.proxy from git configs to save the user from having to
maintain the same configuration for different things.
|
|
We don't want to leak $? from `git config' failures into
lei nor public-inbox-* processes.
|
|
I'm not sure how to best make a UI for one coderepo to many
inboxes/extindices, yet; but at least allow a simple 1:1
mapping, for now. This ensures /$CODEREPO/$OID/s/ can work
as effectively as /$INBOX/$OID/s/ when looking for emails
associated with a git commit.
|
|
This also ensures we won't waste CPU cycles on snapshots
which aren't configured if somebody attempts them by
guessing URLs.
|
|
This field has been unneeded since commit 6890430df808
(cgit: fix fallout from lazy coderepo loading, 2021-03-18)
|
|
This will allow it to easily map a single coderepo to multiple
inboxes (or multiple coderepos to any number of inboxes).
For now, this is just a summary, but $REPO/$OID/s/ support
will be added, along with archive downloads.
Indexing of coderepos will probably be supported via -extindex,
only.
|
|
It's actually valid Perl to have "$foo ->{field} = ..."
but it's confusing and I noticed it while tracking down
a configuration error.
|
|
This means all Inbox, Git, Over, Msgmap, Search objects also get
deduplicated if they belong to the same config file, reducing
memory and FD usage. This helps save memory and improve cache
hit rates in -netd setups where NNTP, IMAP, HTTP, and POP3
servers run in the same process.
InboxIdle was the only bit which needed adjustment, but there
may be other bugs lurking despite all tests passing.
|
|
Old account expiry has not been implemented, but it seems to
work well with both mpop(1) and getmail(1). The strictness of
mpop was particularly helpful in ironing out bugs in our
implementation of (dreaded) message sequence numbers.
"EXPIRE 0" (RFC 2449) can theoretically save numerous "DELE"
commands, but that's untested by real-world clients. mpop
supports PIPELINING which is effective in hiding latency,
and the core networking functionality is already well-tested
from our NNTP and IMAP implementations.
Configuration requires "publicinbox.pop3state" to point to
a directory writable by the otherwise read-only daemon.
See public-inbox-pop3d(1) manpage for more usage details.
|
|
As stated in the previous change, conditional hash assignments
which trigger other hash assignments seem problematic, at times.
So replace:
$h->{k} //= do { $h->{x} = ...; $val };
$h->{k} // do {
$h->{x} = ...;
$hk->{k} = $val
};
"||=" is affected the same way, and some instances of "||=" are
replaced with "//=" or "// do {", now.
|
|
This is necessary for showing "found $OID in $CODEREPO_URL"
in solver-generated pages ($INBOX_URL/$OID/s/).
|
|
We're not using them, anywhere.
|
|
Instead of passing the prefix section and key separately, pass
them together as is commonly done with git-config(1) usage as
well as our ->get_all API. This inconsistency in the get_1 API
is a needless footgun and confused me a bit while working on
"lei up" the other week.
|
|
This allows PublicInbox::WWW hosts to advertise the existence of
IMAP servers in addition to NNTP servers.
|
|
We no longer waste a precious hash slot for a per-Inbox
{nntpserver} if it's only configured globally for all inboxes.
|
|
This should improve the users' chances of seeing errors in
various git config files we use.
|
|
lei shouldn't become unusable if a config file is invalid.
Instead, show the "git config" stderr and attempt to continue
gracefully.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210910141157.6u5adehpx7wftkor@meerkat.local/
|
|
There's currently no support for altid with extindex, and
there's likely no legacy precedent for using altid like there is
with single public-inboxes.
|
|
This behaves identically the lei external "boost" parameter in
prioritizing raw messages for extindex.
Relying exclusively on the config file order doesn't work well
for mirrors since it's impossible to guarantee config file
ordering via grokmirror hooks.
Config file ordering remains the default if boost is
unconfigured, or in case of ties.
Note: I chose the name "boost" rather than "priority" or "rank"
since I always get confused by whether higher or lower numbers
take precedence when it comes to kernel scheduling. "weight" is
also a part of Xapian API terminology, which we currently do not
expose to configuration (but may in the future).
|
|
We'll be using this in lei for watch configs.
|
|
When dealing with thousands of inboxes, displaying all of
them on a single page isn't going to work. So steal some
pagination and search results code from the message search
to generate some basic HTML output that looks good in w3m.
|
|
LD_PRELOAD sent by a client can't affect lei-daemon.
|
|
I don't know if it's worth it to sub (or super)class
PublicInbox::Config into something more generic for
lei, but this change simplifies a good chunk of lei
code that reuses the public-inbox config parsing.
|
|
A user may wish to clobber/refine existing search parameters
by issuing "lei q --save" again. Support that by overwriting
the lei.saved-search state file entirely.
We continue to preserve over.sqlite3 for deduplication purposes.
This way, we don't get something redundant like:
[lei]
q = term1
q = term2
q = term1
q = term2
q = term3
...whenever a user wants to refine their search. Instead,
we'll just have:
[lei]
q = term1
q = term2
q = term3
On the second go.
|
|
git 2.11 and earlier could not handle git directories with
newlines in them, nor does libgit2 support them.
Followup-to: d87dd0e679587043 ("config: reject `\n' in `inboxdir'")
|
|
We can't completely instantiate our cgit wrapper without knowing
knowing cgit locations for serving static content.
Fixes: a5968dc059f655a ("config: lazy-load coderepos, support extindex")
|
|
Extsearch objects are duck-types of Inbox objects, and
are capable of supporting code repos all the same.
|