Date | Commit message (Collapse) |
|
We don't want to waste cycles passing non-ASCII characters
to git.
|
|
Its result is used for HTML anchors and such.
|
|
RFC3977 does not have provisions for whitespace beyond ASCII
TAB, SP, CR and LF. I doubt there's any NNTP clients broken
enough to be sending non-ASCII whitespace delimiters.
We're probably excessively liberal regarding TAB acceptance,
even; but it's probably too late to change at this point...
|
|
We aren't able to make sense of non-ASCII digits
cf. perlrecharclass(1) / "Digits" section
|
|
The "\w" character class in Perl matches any word characters
in the Unicode database, not just ASCII characters. So we
must be prepared for that and generate links to IDNs.
|
|
* origin/git-cleanup:
git: drop the deleted err_c file
git: unconditional expiry
|
|
* origin/ds:
ds: remove PLCMap and per-socket PostLoopCallback
ds: drop write_set_watch field
ds: drop unused EVENT: label in epoll code path
ds: drop checks for invalid descriptors
ds: drop set_writer_func support
ds: add a note about planned future changes
ds: drop more unused subs
|
|
We don't need and won't be needing per-socket PostLoopCallbacks.
|
|
In case we encounter an odd system which has Search::Xapian
but not DBD::SQLite.
|
|
We never enable write watches ourselves for HTTP and NNTP,
and only enable the write watch with EvCleanup because it's
an "always on" watch.
|
|
This was never used in Danga::Socket 1.61, either.
|
|
I've used Danga::Socket for well over a decade in various
projects at this point and have never seen the need for it.
If such a bug ever happens; the process should fall over so
it gets fixed ASAP.
|
|
This is not used by perlbal for OpenSSL support, either;
and it does not appear to be the right layer for doing
write translations anyways (IO::Socket::SSL uses `tie').
|
|
Sometimes I get bored with the email part of this project and
need a distraction :P
|
|
ToClose and HaveEpoll are of no use to us and I see no
future use for them, either.
|
|
Even though we currently don't use it repeatedly, ->Reset
should close() kqueue FDs and not cause the process to run
out of descriptors.
Add a close-on-exec test while we're at it.
|
|
This is true as of e220b8b2ee5cfd458167dc2c6c92726352c4c80e
("Merge remote-tracking branch 'origin/xap-optional' into master")
|
|
We should not be leaking these FDs to git(1) processes,
in case git has a bug that causes it to access the wrong FD.
|
|
No reason to leave that (usually) empty file open after killing off
"cat-file --batch-check". This wasn't an unbound leak, though,
as respawning the --batch-check process would've clobbered the
old err_c file.
|
|
A constant stream of traffic to either httpd/nntpd would mean
git-cat-file processes never expire. Things can go bad after a
full repack, as a full repack will unlink old pack indices and
git-cat-file does not currently detect unlinked files.
We could do something complicated by recursively stat-ing
objects/pack of every git directory and alternate;
but that's probably not worth the trouble compared to
occasionally restarting the cat-file process.
So simplify the code and let httpd/nntpd expire them
periodically, since spawning a "git-cat-file --batch" process
isn't too expensive. We already spawn for every request which
hits git-http-backend, cgit, and git-apply.
In the future, we may optionally support the Git::Raw module
to avoid IPC; but we must remain careful to not leave lingering
FDs open to unlinked files after repack.
|
|
No reason to leave that (usually) empty file open after killing off
"cat-file --batch-check". This wasn't an unbound leak, though,
as respawning the --batch-check process would've clobbered the
old err_c file.
|
|
A constant stream of traffic to either httpd/nntpd would mean
git-cat-file processes never expire. Things can go bad after a
full repack, as a full repack will unlink old pack indices and
git-cat-file does not currently detect unlinked files.
We could do something complicated by recursively stat-ing
objects/pack of every git directory and alternate;
but that's probably not worth the trouble compared to
occasionally restarting the cat-file process.
So simplify the code and let httpd/nntpd expire them
periodically, since spawning a "git-cat-file --batch" process
isn't too expensive. We already spawn for every request which
hits git-http-backend, cgit, and git-apply.
In the future, we may optionally support the Git::Raw module
to avoid IPC; but we must remain careful to not leave lingering
FDs open to unlinked files after repack.
|
|
Taking one step out of setting up a performant deployment could
make setup and administration easier (at the cost of installing
an extra-but-common XS module). This can also be useful for
the day NNTP servers see hug-of-death events.
|
|
This is worth a 1-2% speedup in t/perf-msgview.t rendering 2620
messages currently in https://public-inbox.org/meta/
|
|
* origin/v2-noop-speedup:
v2writable: short-circuit is_ancestor check on equality
v2writable: avoid mm_tmp creation without regen
v2writable: hoist out index_epoch sub
v2writable: split off unindex_range mapping
|
|
I don't have time to check and train spam for all these
projects.
Spam filtering is especially difficult on ruby-core: it
enters via Redmine, so it doesn't have a distinct Received:
chain, and also gets mixed with non-spam bug-report text,
throwing off Bayes training.
And I'm not sure if those mirrors did anybody any good, even;
so lets not say its' a "service" to anybody :P
The actual mirrors remain up, for now, but who knows...
I care about decentralization too much to ask anybody
to trust me to keep anything up :P
|
|
We don't need to use git to check ancestry if object IDs
match on a string comparison.
This saves 100ms or so and brings down the ~0.5s no-op time on
lore.kernel.org/lkml down to ~0.4s.
|
|
Creating mm_tmp is an expensive operation with large inboxes
and can be avoided if there are no new messages to process.
Since git-fetch(1) currently lacks an --exit-code option(*),
mirrors will run `public-inbox-index' unconditionally after
fetch, which is an expensive op if it needs to duplicate
a large SQLite DB.
This speeds up the mirror case of:
git --git-dir=git/$EPOCH.git fetch && public-inbox-index
This reduces the no-op `public-inbox-index' time from over 8s to
~0.5s on a (currently) 7-epoch clone of https://lore.kernel.org/lkml/
on my system.
(*) WIP --exit-code for git-fetch:
https://public-inbox.org/git/87ftphw7mv.fsf@evledraar.gmail.com/
|
|
This will make future changes easier-to-follow.
|
|
It'll make it easier to detect if we have anything to
unindex and run git-log on, at all.
|
|
And use it from Admin.
It's easy to tell what indexlevel=basic is from unconfigured
inboxes, but distinguishing between 'medium' and 'full' would
require stat()-ing position.* files which is fragile and
Xapian-implementation-dependent.
So use the metadata facility of Xapian and store it in the main
partition so Admin tools can deal better with unconfigured
inboxes copied using generic tools like cp(1) or rsync(1).
|
|
It's annoying for people using "git fetch && public-inbox-index"
as one user; and running -httpd/-nntpd as a different user
(where users see different config files).
|
|
* v2-idx-progress:
v2writable: show progress updates for index_sync
index: support --verbose option
v2writable: move index_sync options to sync state
v2writable: use prototypes for internal subs
v2writable: localize unindex-range.$EPOCH to $sync state
v2writable: move {ranges} into $sync state
v2writable: move {regen} into $sync state
v2writable: move {reindex} field to $sync state
v2writable: sync: move delete markers into $sync state
v2writable: introduce $sync state and put mm_tmp in it
|
|
We already "use warnings" everywhere, but could miss some spots.
This ought to cover that, and usually Perl module authors are
consistent about avoiding warnings that we won't clutter our
test outputs.
|
|
This can useful for limiting test resource use without relying
on remembering the variable command-line.
|
|
We can show progress whenever we commit changes to the FS.
|
|
It doesn't implement progress of batches, yet, but it wires
up the parsing of the command-line while preserving output
compatibility.
This output is NOT meant to be stable.
|
|
And use singular `opt' to be consistent with the common name
of 'getopt'.
|
|
Hopefully this improves maintainability by allowing Perl
to do some arg checking for us.
|
|
We don't need to stuff that into $self (V2Writable) which can be
longer-lived than a ->index_sync invocation.
|
|
Yet another temporary variable with no use outside of index_sync.
|
|
regen is always enabled for index_sync nowadays (and has
been for a while).
Rename `index_prepare' to `sync_prepare' to show it's for
->index_sync; and not the online indexing we do for ->add.
|
|
reindexing info is not used outside of the index_sync code path.
|
|
Another small step to reduce parameters passed to reindex_oid.
|
|
A first step towards making the v2 index_sync code
easier-to-follow. More fields to follow...
|
|
`public-inbox-index --reindex' could cause NNTP article number
gaps to form when it also has to deal with new,
never-before-seen commits in mirrors running off `git fetch'.
Fix this by running two distinct invocations of ->index_sync;
once to only reindex old commits, and a second time to index
new commits.
This does not appear to be a problem on v1 at the moment,
but I'll need more time to analyze this.
|
|
We can't pass an empty string to `git merge-base --is-ancestor'
AFAIK, this did NOT present issues in the current test suite.
|
|
It did not cause a test failure because the default fallback
is `indexlevel=full'
|
|
Streaming large blobs can take multiple iterations of the event
loop in our -httpd; so we must not let the File::Temp::Dir
result go out-of-scope when streaming large blobs created from
patches.
|
|
Fix a misspelling and ensure line context is printed by
`die' by leaving out the final '\n'. Also, `delete' was
pointless.
|