Date | Commit message (Collapse) |
|
Support on-demand generation of "/manifest.js.gz" for inboxes.
By default, this matches inboxes with URLs matching the given
request hostname by default.
This makes it easier to create full mirrors of several inboxes
without needing to configure static file serving.
cf. https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git
|
|
Since we already have a mechanism for hiding repositories from
the WWW listing, we might as well support another one for hiding
repositories from the upcoming manifest.js.gz generation.
|
|
While I don't expect git to suddenly start spewing non-ASCII
digits in places I'd expect ASCII, this would make things easier
for future hackers and reviewers.
|
|
Allowing admins to set non-ASCII CSS filenames could
cause unnecessary problems for client and proxies.
|
|
We do not support many mboxrd download range specifications at
the moment; but parsing non-ASCII characters isn't planned.
This makes no difference aside from being able to return 404
slightly earlier than we would've in the past.
|
|
We mainly support git-upload-pack; and maybe somebody uses
git-receive-pack with this. Perhaps other (experimental)
command names are acceptable. But it's unlikely anybody will
want Unicode command names for git services.
|
|
In case some BOFH decides to randomly create directories
using non-ASCII digits all over the place.
|
|
Don't inadvertantly serve git repos containing non-ASCII
digit characters.
|
|
git would not generate non-ASCII digits to describe
hunk offsets, so don't waste more time than necessary
to make sense of non-ASCII digit chars for line offsets.
|
|
Don't waste more cycles than necessary if somebody decides to
put non-ASCII digits in their ~/.public-inbox/config
|
|
Unlikely to matter, but who knows...
|
|
User input contains the darndest things. Don't waste more time
than necessary trying to parse dates out of non-ASCII digits.
|
|
Non-ASCII digits would be interpreted as zero when used as integers.
|
|
Non-ASCII digits would be interpreted as a zeroes as integers.
While we're at it, ensure the Status: code is an ASCII digit,
too; though I would not expect git-http-backend(1) or cgit(1)
start spewing non-ASCII digits at us.
|
|
Passing digits to `timegm' which it does not understand would
be a waste of time.
|
|
Non-ASCII digits aren't specified in RFC3977 for article numbers;
so don't waste a trip to SQLite only to turn up empty.
|
|
cgit uses atoi(3), and now we can retain compatibility.
|
|
Our Hval::to_filename sub has always been strict about emitting
ASCII-only characters for ViewVCS "raw" links.
However, somebody could manually generate a filename with
non-ASCII words for somebody else to download (we have no
cheap and fast way of mapping filenames back to blobs for
validation).
|
|
We don't want to emit funky URLs which can be lost in
translation or cause problems with non-Unicode-aware
clients.
Then, don't accept non-ASCII filenames in URLs, since
a manually-generated URL/filename in attachment downloads
could be used for Unicode homographs to confuse folks who
down the attachment.
|
|
AFAIK all names of charsets are ASCII, so passing non-ASCII
characters from emails to clients would probably confuse clients.
|
|
We only care about the hostname portion for matching,
so this change is probably inconsequential.
|
|
I'm not sure what middlewares care for for SERVER_PORT; but
allowing non-ASCII digits seems non-sensical, here.
|
|
We don't want to waste cycles passing non-ASCII characters
to git.
|
|
Its result is used for HTML anchors and such.
|
|
RFC3977 does not have provisions for whitespace beyond ASCII
TAB, SP, CR and LF. I doubt there's any NNTP clients broken
enough to be sending non-ASCII whitespace delimiters.
We're probably excessively liberal regarding TAB acceptance,
even; but it's probably too late to change at this point...
|
|
We aren't able to make sense of non-ASCII digits
cf. perlrecharclass(1) / "Digits" section
|
|
The "\w" character class in Perl matches any word characters
in the Unicode database, not just ASCII characters. So we
must be prepared for that and generate links to IDNs.
|
|
We don't need and won't be needing per-socket PostLoopCallbacks.
|
|
We never enable write watches ourselves for HTTP and NNTP,
and only enable the write watch with EvCleanup because it's
an "always on" watch.
|
|
This was never used in Danga::Socket 1.61, either.
|
|
I've used Danga::Socket for well over a decade in various
projects at this point and have never seen the need for it.
If such a bug ever happens; the process should fall over so
it gets fixed ASAP.
|
|
This is not used by perlbal for OpenSSL support, either;
and it does not appear to be the right layer for doing
write translations anyways (IO::Socket::SSL uses `tie').
|
|
Sometimes I get bored with the email part of this project and
need a distraction :P
|
|
ToClose and HaveEpoll are of no use to us and I see no
future use for them, either.
|
|
Even though we currently don't use it repeatedly, ->Reset
should close() kqueue FDs and not cause the process to run
out of descriptors.
Add a close-on-exec test while we're at it.
|
|
We should not be leaking these FDs to git(1) processes,
in case git has a bug that causes it to access the wrong FD.
|
|
No reason to leave that (usually) empty file open after killing off
"cat-file --batch-check". This wasn't an unbound leak, though,
as respawning the --batch-check process would've clobbered the
old err_c file.
|
|
A constant stream of traffic to either httpd/nntpd would mean
git-cat-file processes never expire. Things can go bad after a
full repack, as a full repack will unlink old pack indices and
git-cat-file does not currently detect unlinked files.
We could do something complicated by recursively stat-ing
objects/pack of every git directory and alternate;
but that's probably not worth the trouble compared to
occasionally restarting the cat-file process.
So simplify the code and let httpd/nntpd expire them
periodically, since spawning a "git-cat-file --batch" process
isn't too expensive. We already spawn for every request which
hits git-http-backend, cgit, and git-apply.
In the future, we may optionally support the Git::Raw module
to avoid IPC; but we must remain careful to not leave lingering
FDs open to unlinked files after repack.
|
|
This is worth a 1-2% speedup in t/perf-msgview.t rendering 2620
messages currently in https://public-inbox.org/meta/
|
|
We don't need to use git to check ancestry if object IDs
match on a string comparison.
This saves 100ms or so and brings down the ~0.5s no-op time on
lore.kernel.org/lkml down to ~0.4s.
|
|
Creating mm_tmp is an expensive operation with large inboxes
and can be avoided if there are no new messages to process.
Since git-fetch(1) currently lacks an --exit-code option(*),
mirrors will run `public-inbox-index' unconditionally after
fetch, which is an expensive op if it needs to duplicate
a large SQLite DB.
This speeds up the mirror case of:
git --git-dir=git/$EPOCH.git fetch && public-inbox-index
This reduces the no-op `public-inbox-index' time from over 8s to
~0.5s on a (currently) 7-epoch clone of https://lore.kernel.org/lkml/
on my system.
(*) WIP --exit-code for git-fetch:
https://public-inbox.org/git/87ftphw7mv.fsf@evledraar.gmail.com/
|
|
This will make future changes easier-to-follow.
|
|
It'll make it easier to detect if we have anything to
unindex and run git-log on, at all.
|
|
And use it from Admin.
It's easy to tell what indexlevel=basic is from unconfigured
inboxes, but distinguishing between 'medium' and 'full' would
require stat()-ing position.* files which is fragile and
Xapian-implementation-dependent.
So use the metadata facility of Xapian and store it in the main
partition so Admin tools can deal better with unconfigured
inboxes copied using generic tools like cp(1) or rsync(1).
|
|
We can show progress whenever we commit changes to the FS.
|
|
It doesn't implement progress of batches, yet, but it wires
up the parsing of the command-line while preserving output
compatibility.
This output is NOT meant to be stable.
|
|
And use singular `opt' to be consistent with the common name
of 'getopt'.
|
|
Hopefully this improves maintainability by allowing Perl
to do some arg checking for us.
|
|
We don't need to stuff that into $self (V2Writable) which can be
longer-lived than a ->index_sync invocation.
|
|
Yet another temporary variable with no use outside of index_sync.
|