Date | Commit message (Collapse) |
|
We don't need to rely on Xapian search functionality for the
majority of the WWW code, even. subject_normalized is moved to
SearchMsg, where it (probably) makes more sense, anyways.
|
|
We only need it for tests that chdir, and maybe for ENV{PATH}
portability (dash seems fine, not sure about others).
v2: revert change to solver_git.t for FreeBSD 11.2 and document
|
|
We can revisit this, later; but Data::Dumper requires a separate
package for CentOS-7 users, at least.
|
|
CentOS-7 needs the perl-Data-Dumper package, and the
test is small enough to roll our own escaping, here.
|
|
PublicInbox::DS works for every platform we we care about,
nowadays; so checking for it is a waste of time. Cleanup a
few POSIX and Socket imports while we're in the area.
|
|
We were reindexing the full history every invocation of -index
when Xapian was not used because we were incorrectly relying on
'last_commit' metadata stored in Xapian.
Rewrite the indexing logic to be less confusing while we're
at it, since we rely on `git merge-base --is-ancestor' nowadays.
Furthermore, we need to handle message removals from the
overview index correctly when Xapian is not in use.
Co-authored-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
Avoiding reliance on environment variables is a bit cleaner
for writing tests
|
|
|
|
* origin/danga-bundle:
DS: epoll: fix misordered EPOLL_CTL_DEL call
DS: drop unused "_undef" sub
syscall: drop readahead wrapper
build: do not manify DS and Syscall pods
DS: handle EINTR in IO::Poll path, too
DS: workaround IO::Kqueue EINTR (mis-)handling
DS: drop profiling support
DS: remove unused fields and functions
listener: use EPOLLEXCLUSIVE for listen sockets
bundle Danga::Socket and Sys::Syscall
|
|
This can help users track down the source of warnings
when presented with imperfect emails.
While we're at it, make the __WARN__ callback in t/v2writable.t
a no-op since we don't check for warnings, there.
|
|
FreeBSD does not allow non-root users to set S_ISGID;
so git skips this bit on FreeBSD and Debian/kFreeBSD
platforms.
|
|
These modules are unmaintained upstream at the moment, but I'll
be able to help with the intended maintainer once/if CPAN
ownership is transferred. OTOH, we've been waiting for that
transfer for several years, now...
Changes I intend to make:
* EPOLLEXCLUSIVE for Linux
* remove unused fields wasting memory
* kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615
* accept4 support
And some lower priority experiments:
* switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes)
* nginx-style buffering to tmpfile instead of string array
* sendfile off tmpfile buffers
* io_uring maybe?
|
|
This fixes a test failure on my Debian buster system.
Bug report filed for w3m to handle "'":
https://bugs.debian.org/927409
and for "highlight" to favor "'" in case other browsers fail:
https://bugs.debian.org/927410
|
|
Dangling parentheses with trailing punctuation usually means the
parentheses is not intended as part of the URL.
|
|
The URLs at the top of WwwStream.pm weren't getting linkified
correctly.
|
|
This will be used for generating an HTML listing for v1 inboxes,
at least. The logic for this follows that of grokmirror,
and we may dynamically generate manifest.js.gz natively...
|
|
'$inbox' is more human-readable, so that is for the more
human-readable name in most cases. Making our variable naming
more consistent should make the code easier-to-review and
harder to screw up.
|
|
Our high-level config already treats single limits as a
soft==hard limit for limiters; so stop handling that redundant
in the low-level spawn() sub.
|
|
We'll be spawning cgit and git-diff, which can take gigantic
amounts of CPU time and/or heap given the right (ermm... wrong)
input. Limit the damage that large/expensive diffs can cause.
|
|
This will be useful for extracting titles/subjects from
commit objects when displaying commits.
|
|
We were relying on Danga::Socket using the "bytes" pragma,
previously. Nowadays, the "bytes" pragma is not recommended in
general, but bytes::length remains acceptable for getting the
byte-size of a scalar.
|
|
WwwStream started depending on the WWW::style method
for configurable CSS, so mock ::style so the benchmark
runs properly.
Fixes: f026dbdd392c9dd5 ('www: admin-configurable CSS via "publicinbox.css"')
|
|
No point in making noise about something that isn't used.
|
|
This is compatible with Markdown; but we still keep the WYSIWYG
nature of plain-text with this. This is only intended for use
with our documentation. Enabling any type of Markdown support
for emails can lead to incompatibilities or interopability
problems with alternative implementations.
|
|
It turns out there's no point in having multiple instances of
this or having to worry about destruction or destruction
ordering.
This will make it easier to reuse the one instance we have
across different modules.
|
|
We'll want to use to support highlighting syntax used by
Markdown and possibly other markup languages (while retaining
the raw plain-text layout and formatting).
|
|
Favor in-place utf8::decode since it's a bit faster without
method dispatch overhead; and don't care about validity just
yet.
HlMod->do_hl itself should return "utf8" strings, since other
parts of our code can use it, so it's not the job of ViewVCS to
post-process HlMod output.
|
|
This is the fallback for the normal WWW endpoint.
Adding this to the top-level seems to be alright, since lynx and
w3m both understand nntp://<HOSTNAME>/<Message-ID> anyways.
If newsgroup and inbox names conflict, then consider it the
fault of the original sender.
Since NewsWWW is intended to support buggy linkifiers in mail clients,
they can interpret nntp:// URLs as http://<HOSTNAME>/<Message-ID>
Inbox ordering from the config file is preserved since
commit cfa8ff7c256e20f3240aed5f98d155c019788e3b
("config: each_inbox iteration preserves config order"),
so admins can rely on that to configure how scanning
works.
Requested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
cf. https://public-inbox.org/meta/20190107190719.GE9442@pure.paranoia.local/
nntp://news.public-inbox.org/20190107190719.GE9442@pure.paranoia.local
|
|
Sometimes users will write "http://example.com" without the
trailing slash, which every browser and tool I've tested seems
to understand.
|
|
We'll use HTML attributes + anchor links to link to filenames
in coming commits.
|
|
* origin/purge:
implement public-inbox-purge tool
v2writable: read epoch on purge
v2writable: cleanup processes when done
v2writable: purge ignores non-existent git epoch directories
v2writable: ->purge returns undef on no-op
import: purge: reap fast-export process
hoist out resolve_repo_dir from -index
|
|
This will make it easier to make command-line tools
from SolverGit.
|
|
public-inbox can only index the abbreviated object_ids in
emails, not the full or even longer-than-necessary object_ids.
So retry failed object_ids if they're longer than 7 hex
characters.
|
|
Otherwise, long-running but idle git processes may keep unlinked
packs around indefinitely and waste disk space.
|
|
We need to ensure we don't introduce unnecessary processes
and memory usage for mapping multiple inboxes to the same
code repos.
|
|
* origin/viewvcs: (66 commits)
solvergit: deal with alternative diff prefixes
solvergit: extract mode from diff headers properly
solvergit: avoid "Wide character" warnings
solvergit: do not show full path names to "git apply"
css/216dark: add comments and tweak highlight colors
viewvcs: avoid segfault with highlight.pm at shutdown
solvergit: do not solve blobs twice
t/check-www-inbox: disable history
t/check-www-inbox: don't follow mboxes
t/check-www-inbox: replace IPC::Run with PublicInbox::Spawn
hval: add src_escape for highlight post-processing
viewvcs: wire up syntax-highlighting for blobs
hlmod: disable enclosing <pre> tag
t/hl_mod: extra check to ensure we escape HTML
wwwhighlight: read_in_full returns undef on errors
solver: crank up max patches to 9999
viewvcs: do not show final error message twice
qspawn: decode $? for user-friendliness
solver: reduce "git apply" invocations
solver: hold patches in temporary directory
...
|
|
WWW::Mechanize keeps an infinitely large stack, which was
leading to OOM errors on my system.
|
|
They can be extremely large with no limit, so can lead to OOM
errors.
|
|
Because WWW::Mechanize uses truckload of memory, fork
needs to prepare all that memory for CoW, which ends up
bailing with ENOMEM.
|
|
Looking at git@vger history, several emails had broken
References/In-Reply-To pointing to <y>, <n> and email
addresses as Message-IDs in References and In-Reply-To
headers.
This was causing too many unrelated messages to be linked
together in the same thread.
|
|
We need to post-process "highlight" output to ensure it doesn't
contain odd bytes which cause "wide character" warnings or
require odd glyphs in source form.
|
|
We already have a <pre> tag in ViewVCS, and nesting <pre>
inside the pre-existing <pre> overrides the "white-space:pre"
we use to align line numbers.
|
|
Otherwise, it's open season on our users :<
|
|
The raw value of $? isn't very useful, generally.
|
|
The psgi_qx routine in the now-abandoned "repobrowse" branch
allows us to break down blob-solving at each process execution
point. It reuses the Qspawn facility for git-http-backend(1),
allowing us to limit parallel subprocesses independently of Perl
worker count.
This is actually a 2-3% slower a fully-synchronous execution;
but it is fair to other clients as it won't monopolize the server
for hundreds of milliseconds (or even seconds) at a time.
|
|
|
|
Otherwise, temporary GDBM files don't get unlinked
when I SIGINT the process.
|
|
I'll probably expose the PSGI service for cgit;
but it could be useful to others as well.
|
|
Oops, I might've left it out, somewhere.
|
|
For cross-inbox Message-ID resolution; having some sort of
stable ordering makes the most sense. Relying on the
order of the config file seems most natural and allows us
to avoid introducing yet another configuration knob.
|