Date | Commit message (Collapse) |
|
While I don't expect git to suddenly start spewing non-ASCII
digits in places I'd expect ASCII, this would make things easier
for future hackers and reviewers.
|
|
Allowing admins to set non-ASCII CSS filenames could
cause unnecessary problems for client and proxies.
|
|
We do not support many mboxrd download range specifications at
the moment; but parsing non-ASCII characters isn't planned.
This makes no difference aside from being able to return 404
slightly earlier than we would've in the past.
|
|
We mainly support git-upload-pack; and maybe somebody uses
git-receive-pack with this. Perhaps other (experimental)
command names are acceptable. But it's unlikely anybody will
want Unicode command names for git services.
|
|
In case some BOFH decides to randomly create directories
using non-ASCII digits all over the place.
|
|
Don't inadvertantly serve git repos containing non-ASCII
digit characters.
|
|
git would not generate non-ASCII digits to describe
hunk offsets, so don't waste more time than necessary
to make sense of non-ASCII digit chars for line offsets.
|
|
Don't waste more cycles than necessary if somebody decides to
put non-ASCII digits in their ~/.public-inbox/config
|
|
Unlikely to matter, but who knows...
|
|
User input contains the darndest things. Don't waste more time
than necessary trying to parse dates out of non-ASCII digits.
|
|
Non-ASCII digits would be interpreted as zero when used as integers.
|
|
Non-ASCII digits would be interpreted as a zeroes as integers.
While we're at it, ensure the Status: code is an ASCII digit,
too; though I would not expect git-http-backend(1) or cgit(1)
start spewing non-ASCII digits at us.
|
|
Passing digits to `timegm' which it does not understand would
be a waste of time.
|
|
Non-ASCII digits aren't specified in RFC3977 for article numbers;
so don't waste a trip to SQLite only to turn up empty.
|
|
cgit uses atoi(3), and now we can retain compatibility.
|
|
Our Hval::to_filename sub has always been strict about emitting
ASCII-only characters for ViewVCS "raw" links.
However, somebody could manually generate a filename with
non-ASCII words for somebody else to download (we have no
cheap and fast way of mapping filenames back to blobs for
validation).
|
|
We don't want to emit funky URLs which can be lost in
translation or cause problems with non-Unicode-aware
clients.
Then, don't accept non-ASCII filenames in URLs, since
a manually-generated URL/filename in attachment downloads
could be used for Unicode homographs to confuse folks who
down the attachment.
|
|
AFAIK all names of charsets are ASCII, so passing non-ASCII
characters from emails to clients would probably confuse clients.
|
|
We only care about the hostname portion for matching,
so this change is probably inconsequential.
|
|
I'm not sure what middlewares care for for SERVER_PORT; but
allowing non-ASCII digits seems non-sensical, here.
|
|
We don't want to waste cycles passing non-ASCII characters
to git.
|
|
Its result is used for HTML anchors and such.
|
|
RFC3977 does not have provisions for whitespace beyond ASCII
TAB, SP, CR and LF. I doubt there's any NNTP clients broken
enough to be sending non-ASCII whitespace delimiters.
We're probably excessively liberal regarding TAB acceptance,
even; but it's probably too late to change at this point...
|
|
We aren't able to make sense of non-ASCII digits
cf. perlrecharclass(1) / "Digits" section
|
|
The "\w" character class in Perl matches any word characters
in the Unicode database, not just ASCII characters. So we
must be prepared for that and generate links to IDNs.
|
|
We don't need and won't be needing per-socket PostLoopCallbacks.
|
|
We never enable write watches ourselves for HTTP and NNTP,
and only enable the write watch with EvCleanup because it's
an "always on" watch.
|
|
This was never used in Danga::Socket 1.61, either.
|
|
I've used Danga::Socket for well over a decade in various
projects at this point and have never seen the need for it.
If such a bug ever happens; the process should fall over so
it gets fixed ASAP.
|
|
This is not used by perlbal for OpenSSL support, either;
and it does not appear to be the right layer for doing
write translations anyways (IO::Socket::SSL uses `tie').
|
|
Sometimes I get bored with the email part of this project and
need a distraction :P
|
|
ToClose and HaveEpoll are of no use to us and I see no
future use for them, either.
|
|
Even though we currently don't use it repeatedly, ->Reset
should close() kqueue FDs and not cause the process to run
out of descriptors.
Add a close-on-exec test while we're at it.
|
|
We should not be leaking these FDs to git(1) processes,
in case git has a bug that causes it to access the wrong FD.
|
|
No reason to leave that (usually) empty file open after killing off
"cat-file --batch-check". This wasn't an unbound leak, though,
as respawning the --batch-check process would've clobbered the
old err_c file.
|
|
A constant stream of traffic to either httpd/nntpd would mean
git-cat-file processes never expire. Things can go bad after a
full repack, as a full repack will unlink old pack indices and
git-cat-file does not currently detect unlinked files.
We could do something complicated by recursively stat-ing
objects/pack of every git directory and alternate;
but that's probably not worth the trouble compared to
occasionally restarting the cat-file process.
So simplify the code and let httpd/nntpd expire them
periodically, since spawning a "git-cat-file --batch" process
isn't too expensive. We already spawn for every request which
hits git-http-backend, cgit, and git-apply.
In the future, we may optionally support the Git::Raw module
to avoid IPC; but we must remain careful to not leave lingering
FDs open to unlinked files after repack.
|
|
This is worth a 1-2% speedup in t/perf-msgview.t rendering 2620
messages currently in https://public-inbox.org/meta/
|
|
We don't need to use git to check ancestry if object IDs
match on a string comparison.
This saves 100ms or so and brings down the ~0.5s no-op time on
lore.kernel.org/lkml down to ~0.4s.
|
|
Creating mm_tmp is an expensive operation with large inboxes
and can be avoided if there are no new messages to process.
Since git-fetch(1) currently lacks an --exit-code option(*),
mirrors will run `public-inbox-index' unconditionally after
fetch, which is an expensive op if it needs to duplicate
a large SQLite DB.
This speeds up the mirror case of:
git --git-dir=git/$EPOCH.git fetch && public-inbox-index
This reduces the no-op `public-inbox-index' time from over 8s to
~0.5s on a (currently) 7-epoch clone of https://lore.kernel.org/lkml/
on my system.
(*) WIP --exit-code for git-fetch:
https://public-inbox.org/git/87ftphw7mv.fsf@evledraar.gmail.com/
|
|
This will make future changes easier-to-follow.
|
|
It'll make it easier to detect if we have anything to
unindex and run git-log on, at all.
|
|
And use it from Admin.
It's easy to tell what indexlevel=basic is from unconfigured
inboxes, but distinguishing between 'medium' and 'full' would
require stat()-ing position.* files which is fragile and
Xapian-implementation-dependent.
So use the metadata facility of Xapian and store it in the main
partition so Admin tools can deal better with unconfigured
inboxes copied using generic tools like cp(1) or rsync(1).
|
|
We can show progress whenever we commit changes to the FS.
|
|
It doesn't implement progress of batches, yet, but it wires
up the parsing of the command-line while preserving output
compatibility.
This output is NOT meant to be stable.
|
|
And use singular `opt' to be consistent with the common name
of 'getopt'.
|
|
Hopefully this improves maintainability by allowing Perl
to do some arg checking for us.
|
|
We don't need to stuff that into $self (V2Writable) which can be
longer-lived than a ->index_sync invocation.
|
|
Yet another temporary variable with no use outside of index_sync.
|
|
regen is always enabled for index_sync nowadays (and has
been for a while).
Rename `index_prepare' to `sync_prepare' to show it's for
->index_sync; and not the online indexing we do for ->add.
|
|
reindexing info is not used outside of the index_sync code path.
|