about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2019-06-05tighten up digit matches to ASCII for git output
While I don't expect git to suddenly start spewing non-ASCII digits in places I'd expect ASCII, this would make things easier for future hackers and reviewers.
2019-06-04www: require ASCII word characters for CSS filenames
Allowing admins to set non-ASCII CSS filenames could cause unnecessary problems for client and proxies.
2019-06-04www: require ASCII range for mbox downloads
We do not support many mboxrd download range specifications at the moment; but parsing non-ASCII characters isn't planned. This makes no difference aside from being able to return 404 slightly earlier than we would've in the past.
2019-06-04githttpbackend: require ASCII in path
We mainly support git-upload-pack; and maybe somebody uses git-receive-pack with this. Perhaps other (experimental) command names are acceptable. But it's unlikely anybody will want Unicode command names for git services.
2019-06-04require ASCII digits for local FS items
In case some BOFH decides to randomly create directories using non-ASCII digits all over the place.
2019-06-04www: require ASCII digit for git epoch
Don't inadvertantly serve git repos containing non-ASCII digit characters.
2019-06-04solver|viewdiff: restrict digit matches to ASCII
git would not generate non-ASCII digits to describe hunk offsets, so don't waste more time than necessary to make sense of non-ASCII digit chars for line offsets.
2019-06-04inbox: require ASCII digits for feedmax var
Don't waste more cycles than necessary if somebody decides to put non-ASCII digits in their ~/.public-inbox/config
2019-06-04filter/rubylang: require ASCII digit for mailcount
Unlikely to matter, but who knows...
2019-06-04msgtime: require ASCII digits for parsing dates
User input contains the darndest things. Don't waste more time than necessary trying to parse dates out of non-ASCII digits.
2019-06-04searchview: do not allow non-ASCII offsets and limits
Non-ASCII digits would be interpreted as zero when used as integers.
2019-06-04githttpbackend: require Range:, Status: to be ASCII digits
Non-ASCII digits would be interpreted as a zeroes as integers. While we're at it, ensure the Status: code is an ASCII digit, too; though I would not expect git-http-backend(1) or cgit(1) start spewing non-ASCII digits at us.
2019-06-04view: require YYYYmmDD(HHMMSS) timestamps to be ASCII
Passing digits to `timegm' which it does not understand would be a waste of time.
2019-06-04newswww: only accept ASCII digits as article numbers
Non-ASCII digits aren't specified in RFC3977 for article numbers; so don't waste a trip to SQLite only to turn up empty.
2019-06-04config: do not accept non-ASCII digits in cgitrc params
cgit uses atoi(3), and now we can retain compatibility.
2019-06-04www: require ASCII filenames in git blob downloads
Our Hval::to_filename sub has always been strict about emitting ASCII-only characters for ViewVCS "raw" links. However, somebody could manually generate a filename with non-ASCII words for somebody else to download (we have no cheap and fast way of mapping filenames back to blobs for validation).
2019-06-04www: only emit ASCII chars in attachment filenames
We don't want to emit funky URLs which can be lost in translation or cause problems with non-Unicode-aware clients. Then, don't accept non-ASCII filenames in URLs, since a manually-generated URL/filename in attachment downloads could be used for Unicode homographs to confuse folks who down the attachment.
2019-06-04wwwattach: only pass the charset through if ASCII
AFAIK all names of charsets are ASCII, so passing non-ASCII characters from emails to clients would probably confuse clients.
2019-06-04wwwlisting: require ASCII digit for port number
We only care about the hostname portion for matching, so this change is probably inconsequential.
2019-06-04http: require SERVER_PORT to be ASCII digit
I'm not sure what middlewares care for for SERVER_PORT; but allowing non-ASCII digits seems non-sensical, here.
2019-06-04feed: only accept ASCII digits for ref~$N
We don't want to waste cycles passing non-ASCII characters to git.
2019-06-04mid: id_compress requires ASCII-clean words
Its result is used for HTML anchors and such.
2019-06-04nntp: ensure we only handle ASCII whitespace
RFC3977 does not have provisions for whitespace beyond ASCII TAB, SP, CR and LF. I doubt there's any NNTP clients broken enough to be sending non-ASCII whitespace delimiters. We're probably excessively liberal regarding TAB acceptance, even; but it's probably too late to change at this point...
2019-06-04nntp: be explicit about ASCII digit matches
We aren't able to make sense of non-ASCII digits cf. perlrecharclass(1) / "Digits" section
2019-06-04linkify: support Internationalized Domain Names in URLs
The "\w" character class in Perl matches any word characters in the Unicode database, not just ASCII characters. So we must be prepared for that and generate links to IDNs.
2019-06-03ds: remove PLCMap and per-socket PostLoopCallback
We don't need and won't be needing per-socket PostLoopCallbacks.
2019-06-02ds: drop write_set_watch field
We never enable write watches ourselves for HTTP and NNTP, and only enable the write watch with EvCleanup because it's an "always on" watch.
2019-06-02ds: drop unused EVENT: label in epoll code path
This was never used in Danga::Socket 1.61, either.
2019-06-02ds: drop checks for invalid descriptors
I've used Danga::Socket for well over a decade in various projects at this point and have never seen the need for it. If such a bug ever happens; the process should fall over so it gets fixed ASAP.
2019-06-02ds: drop set_writer_func support
This is not used by perlbal for OpenSSL support, either; and it does not appear to be the right layer for doing write translations anyways (IO::Socket::SSL uses `tie').
2019-06-02ds: add a note about planned future changes
Sometimes I get bored with the email part of this project and need a distraction :P
2019-06-02ds: drop more unused subs
ToClose and HaveEpoll are of no use to us and I see no future use for them, either.
2019-06-01ds: fix and test for FD leaks with kqueue on ->Reset
Even though we currently don't use it repeatedly, ->Reset should close() kqueue FDs and not cause the process to run out of descriptors. Add a close-on-exec test while we're at it.
2019-06-01ds: set close-on-exec flag on epoll descriptors
We should not be leaking these FDs to git(1) processes, in case git has a bug that causes it to access the wrong FD.
2019-06-01git: drop the deleted err_c file
No reason to leave that (usually) empty file open after killing off "cat-file --batch-check". This wasn't an unbound leak, though, as respawning the --batch-check process would've clobbered the old err_c file.
2019-06-01git: unconditional expiry
A constant stream of traffic to either httpd/nntpd would mean git-cat-file processes never expire. Things can go bad after a full repack, as a full repack will unlink old pack indices and git-cat-file does not currently detect unlinked files. We could do something complicated by recursively stat-ing objects/pack of every git directory and alternate; but that's probably not worth the trouble compared to occasionally restarting the cat-file process. So simplify the code and let httpd/nntpd expire them periodically, since spawning a "git-cat-file --batch" process isn't too expensive. We already spawn for every request which hits git-http-backend, cgit, and git-apply. In the future, we may optionally support the Git::Raw module to avoid IPC; but we must remain careful to not leave lingering FDs open to unlinked files after repack.
2019-05-31viewdiff: avoid repeat variable expansion
This is worth a 1-2% speedup in t/perf-msgview.t rendering 2620 messages currently in https://public-inbox.org/meta/
2019-05-30v2writable: short-circuit is_ancestor check on equality
We don't need to use git to check ancestry if object IDs match on a string comparison. This saves 100ms or so and brings down the ~0.5s no-op time on lore.kernel.org/lkml down to ~0.4s.
2019-05-30v2writable: avoid mm_tmp creation without regen
Creating mm_tmp is an expensive operation with large inboxes and can be avoided if there are no new messages to process. Since git-fetch(1) currently lacks an --exit-code option(*), mirrors will run `public-inbox-index' unconditionally after fetch, which is an expensive op if it needs to duplicate a large SQLite DB. This speeds up the mirror case of: git --git-dir=git/$EPOCH.git fetch && public-inbox-index This reduces the no-op `public-inbox-index' time from over 8s to ~0.5s on a (currently) 7-epoch clone of https://lore.kernel.org/lkml/ on my system. (*) WIP --exit-code for git-fetch: https://public-inbox.org/git/87ftphw7mv.fsf@evledraar.gmail.com/
2019-05-30v2writable: hoist out index_epoch sub
This will make future changes easier-to-follow.
2019-05-30v2writable: split off unindex_range mapping
It'll make it easier to detect if we have anything to unindex and run git-log on, at all.
2019-05-29searchidx: store indexlevel=medium as metadata
And use it from Admin. It's easy to tell what indexlevel=basic is from unconfigured inboxes, but distinguishing between 'medium' and 'full' would require stat()-ing position.* files which is fragile and Xapian-implementation-dependent. So use the metadata facility of Xapian and store it in the main partition so Admin tools can deal better with unconfigured inboxes copied using generic tools like cp(1) or rsync(1).
2019-05-29v2writable: show progress updates for index_sync
We can show progress whenever we commit changes to the FS.
2019-05-29index: support --verbose option
It doesn't implement progress of batches, yet, but it wires up the parsing of the command-line while preserving output compatibility. This output is NOT meant to be stable.
2019-05-29v2writable: move index_sync options to sync state
And use singular `opt' to be consistent with the common name of 'getopt'.
2019-05-29v2writable: use prototypes for internal subs
Hopefully this improves maintainability by allowing Perl to do some arg checking for us.
2019-05-29v2writable: localize unindex-range.$EPOCH to $sync state
We don't need to stuff that into $self (V2Writable) which can be longer-lived than a ->index_sync invocation.
2019-05-29v2writable: move {ranges} into $sync state
Yet another temporary variable with no use outside of index_sync.
2019-05-29v2writable: move {regen} into $sync state
regen is always enabled for index_sync nowadays (and has been for a while). Rename `index_prepare' to `sync_prepare' to show it's for ->index_sync; and not the online indexing we do for ->add.
2019-05-29v2writable: move {reindex} field to $sync state
reindexing info is not used outside of the index_sync code path.