about summary refs log tree commit homepage
DateCommit message (Collapse)
2020-04-19reduce scope of mbox From_ line removal
It's unnecessary overhead for anything which does Email::MIME parsing. It was never done for v2 indexing, even though v1->v2 conversions did NOT remove those From_ lines. There was never a need to remote From_ lines the v1 SearchIdx paths, either. Hitting a /$INBOX_URL/$MSGID/T/ endpoint with an 18 message thread reveals a ~0.5% speed improvement. This will become more apparent when we have a faster MIME parser.
2020-04-19mbox: use per-message line-ending for From_ line
Email::Simple preserves the message line ending in headers, so make the From_ line consistent with the rest of the headers.
2020-04-19wwwatomstream: move {emit_header} field to $self
There's no need to pollute the cross-package $ctx with it.
2020-04-19favor `do {}' over `eval {}' for localized slurp
I did not know to use the return value of `do' back in the day. There's probably no practical difference in these cases, but `eval' is overkill for these uses and may hide actual errors. We can get rid of a few redundant `scalar' ops and pass scalar refs to Email::MIME->new to avoid copies in a few more places, too.
2020-04-19inbox: replace `eval {}' with `do {}' where appropriate
-Git->new and -Limiter->new will never fail unless there's an OOM, so using `eval' is incorrect.
2020-04-19inbox: don't memoize missing description|cloneurl
It's probably common to have inboxes initially setup without these files properly configured, so don't memoize at that stage.
2020-04-19searchidx: die on cat-file failures
We always use the object ID from "git <log|rev-list>" for retrieving blobs, so fail loudly if the git repository is corrupt instead of silently continuing.
2020-04-19inboxwritable: mime_from_path: reuse in more places
There's nothing Maildir-specific about the function, so `maildir_path_load' was a bad name. So give it a more appropriate name and use it in our tests. This save ourselves some code and inconsistency by reusing an existing internal library routine in more places. We can drop the "From_" line in some of our (formerly) mbox sample files.
2020-04-17searchthread: reduce indirection by removing container
We can rid ourselves of a layer of indirection by subclassing PublicInbox::Smsg instead of using a container object to hold each $smsg. Furthermore, the `{id}' vs. `{mid}' field name confusion is eliminated. This reduces the size of the $rootset passed to walk_thread by around 15%, that is over 50K memory when rendering a /$INBOX/ landing page.
2020-04-17doc: update 1.4.0 relnotes with date, start 1.5.0
2020-04-17public-inbox 1.4.0
2020-04-17t/httpd-unix: skip some tests w/o signalfd|EVFILT_SIGNAL
Some of these tests just don't seem reliable enough with the way we or Perl do portable signal handling.
2020-04-16t/httpd-corner: improve reliability and diagnostics
The graceful-shutdown-on-PUT test is unreliable because we can't rely on a FIFO as we do with the GET tests. So increase the delay to 100ms since that seems enough on my system even with CONFIG_HZ=100. Add a timeout and backtrace to the $check_self sub to help with further diagnostics while we're at it, too. It would be nice if there were a portable syscall tracing mechanism we could attach to the -httpd process to make the test more determistic...
2020-04-15t/httpd-corner.t: relax read-after-failed-write handling
I've observed FreeBSD 11.2 read(2) having one of three behaviors after a failed write(2) on a socket: 1) returning number of bytes read 2) failing with ECONNRESET 3) returning with EOF 1) is the most common, and I've only seen 1) on Linux. It may be possible to use SO_LINGER or shutdown(2) to ensure 1) always happens, but SO_LINGER behavior seems inconsistent across OSes, especially with non-blocking sockets. Since these tests are corner-cases where we're dealing with broken/malicious clients, lets continue spending the least amount of syscalls protecting ourselves in the daemon and instead make the client-side test code tolerate more socket implementations.
2020-04-15t/*.t: localize $SIG{__WARN__} changes
We don't want to propagate %SIG changes to other tests when running multiple tests within the same process via t/run.perl.
2020-04-15dskqxs: ignore EV_SET errors on EVFILT_WRITE
Just like the EPOLL_CTL_ADD emulation path, the EPOLL_CTL_MOD and EPOLL_CTL_DEL emulation paths can fail if attempting to install an EVFILT_WRITE for a read-only pipe. I've only observed this on the EPOLL_CTL_DEL emulation path, but I suspect it could happen on the EPOLL_CTL_MOD path as well. Increasing the amount of read-only pipes we rely on with altid exports via sqlite3 made this old bug more apparent and reproducible while looping the test suite. This may be adjusted in the future to deal with write-only pipes, but we currently don't have any of those watched by kqueue.
2020-04-15testcommon: DESTROY: wait for killed daemon
Otherwise, the waitpid(-1, 0) call in Xapcmd::process_queue() may reap it in a subsequent test when using t/run.perl to reuse processes for testing. While we're at it, make Xapcmd::process_queue warn about unknown PIDs in case other PIDs leak through to us in the future.
2020-04-15MANIFEST update
2020-04-13doc: add technical/whyperl
Some people don't like Perl; but it exists, there's no avoiding it with everything that depends on it. And nearly all code still works unmodified after 20 years.
2020-04-13doc: start reproducibility document
Not new ideas, just gathering thoughts.
2020-04-12doc: escape internal ">" in listid code snippet
A code snippet in the listid description is incorrectly rendered as "publicinbox.$NAME.watchheader=List-Id:<foo.example.com"> Escape the closing bracket around the List-Id value to avoid this. Also escape the opening bracket for symmetry/readability.
2020-04-09t/httpd-unix: improve test reliability
Net::Server::Daemonize::create_pid_file does not write the PID file atomically, so we need to barf if it's incomplete.
2020-04-09triewyde: ficks soem speling errrors
Dikshunarees R gude!
2020-04-09tests: document run_mode=1 as not implemented
It was implemented at some point, but it was more things to support and the worst of both worlds: both unrealistic compared to real-world use and slower than run_mode=2. Noticed while looking for speling erorrs.
2020-04-07view: do not redundantly obfuscate addresses
We shouldn't rerun the address obfuscator on data we've already run through. Instead, run through the unescaped text part and substitute the UTF-8 "\x{2022}" substitution before it hits HTML escaping Fixes: 9bdd81dc16ba6511 ("view: msg_iter calls add_body_text directly")
2020-04-07portability: constants for NetBSD
NetBSD implements O_CLOEXEC, so let us use it to avoid inadvertant FD sharing. It also has the same value for SIGWINCH as Linux and the other BSDs we support.
2020-04-07xt/perf-msgview: update to use git->cat_async
It's about 5-10% faster on an SMP machine with an SSD, even on a hot Linux page cache.
2020-04-06examples/grok-pull.post_update_hook: move url_base to the top
Users are encouraged to edit this script, anyways, so make it easy for them to swap out and use whatever URL they need.
2020-04-06examples/grok-pull.post_update_hook: capture infourl
The value of infourl parameters are shared in the config, so include them in the mirror.
2020-04-06examples/grok-pull.post_update_hook: fetch mirror description
The $INBOX_URL/description endpoint is available since v1.3.0, so use it in mirrors.
2020-04-05git: reduce stat buffer storage overhead
The stat() array is a whopping 480 bytes (on x86-64, Perl 5.28), while the new packed representation of two 64-bit doubles as a scalar is "only" 56 bytes. This can add up when there's many inboxes. Just use a string comparison on the packed representation. Some 32-bit Perl builds (IIRC OpenBSD) lack quad support, so doubles were chosen for pack() portability.
2020-04-05mbox: halve ->getline "context switches"
We don't need to take extra trips through the event loop for a single message (in the common case of Message-IDs being unique). In fact, holding the body reference left behind by Email::Simple could be harmful to memory usage, though in practice it's not a big problem since code paths which use Email::MIME take far more.
2020-04-05release large (non ref) scalars using `undef $sv'
Using `undef EXPR' like a function call actually frees the heap memory associated with the scalar, whereas `$sv = undef' or `$sv = ""' will hold the buffer around until $sv goes out of scope. The `sv_set_undef' documentation in the perlapi(1) manpage explicitly states this: The perl equivalent is "$sv = undef;". Note that it doesn't free any string buffer, unlike "undef $sv". And I've confirmed by reading Dump() output from Devel::Peek. We'll also inline the old index_body sub in SearchIdx.pm to make the scope of the scalar more obvious. This change saves several hundred kB RSS on both -index and -httpd when hitting large emails with thousands of lines.
2020-04-05xt/msgtime_cmp: fix false positives from msgtime change
commit d857e7dc0d816b635a7ead09c3273f8c2d2434be ("msgtime: assume +0000 if TZ missing when using Date::Parse") introduced a behavior change which was causes false positives when compared to the old code. Update the "old" implementation to match this overdue behavior change.
2020-04-05wwwstatic: set "Vary: Accept-Encoding" in static gzip response
We don't want to confuse intermediate caches into serving gzipped content to any clients which can't handle it. It probably doesn't matter in practice, though, since every HTTP client seems to handle "Content-Encoding: gzip" regardless of whether it was requested or not, though I could expect some nc/socat/telnet/s_client users being annoyed. This also matches the behavior of Plack::Middleware::Deflater and other deflater implementations.
2020-04-04view: inline flush_quote sub
No point in having an extra sub for a short, commonly called function in the same file.
2020-04-04viewdiff: reduce sub parameter count
We're slowly moving towards doing all of our output buffering into a single buffer, so passing that around on the stack as a dedicated parameter is confusing.
2020-04-04view: dedupe_subject: allow "0" as a valid Subject
While rare in practice (even by spammers), A single "0" could theoretically be the entire contents of a Subject line. So use the Perl 5.10+ defined-or operator to improve correctness of subject deduplication.
2020-04-04view: use defined-or operator to simplify checks
We depend on Perl 5.10 features in other places. Shorten the lifetime of the `$desc' scalar while we're at it.
2020-04-04view: note we assume UTF-8 on unknown encodings
Clarify that we're assuming the text is UTF-8, since users may have no idea how it's mangled.
2020-04-04inboxwritable: fix From_ line unescaping
We can't rely on Email::MIME noticing the change to our scalar ref after calling `PublicInbox::MIME->new'. This is because Email::MIME::body_set (unlike Email::Simple::body_set) will copy the contents of the body into `->{body_raw}' as a new scalar. Furthermore, we need to escape multiple From lines in the body, not just the first one, using the `g' modifier to `s//'. Reported-by: Kyle Meyer <kyle@kyleam.com>
2020-04-03quiet "Complex regular subexpression recursion limit" warnings
These seem mostly harmless since Perl will just truncate the match and start a new one on a newline boundary in our case. The only downside is we'd end up with redundant <span> tags in HTML. Limiting the number of line matched ourselves with `{1,$NUM}' doesn't seem prudent since lines vary in length, so we continue to defer the job of limiting matches to the Perl regexp engine. I've noticed this warning in practice on 100K+ line patches to locale data.
2020-04-03view: handle the topic-free case properly
There may be no topics for a given timestamp range, so don't attempt to treat `undef' as an arrayref.
2020-04-02nntp: allow multiple spaces or tabs to delimit args
While this is not a known problem in practice, RFC 3977 section 3.1 states: Keywords and arguments MUST each be separated by one or more space or TAB characters.
2020-04-02mid: add $MID_EXTRACT regexp for export
This allows us to consistently enforce the same Message-ID extraction rules everywhere and makes it easier for us to make changes in the future. Update scripts/ssoma-replay, as well, but don't rely on PublicInbox::* modules in that since it's legacy and public-inbox was never a dependency of ssoma.
2020-04-02searchidx: v1: skip mid_clean on mid_mime results
We do not need run mid_clean() since mid_mime() uses mids() to extract the msgid from inside the angle brackets.
2020-04-02smsg: inline _extract_mid functionality
No need to keep an extra sub which isn't called anywhere else, and the mid_clean call is redundant since mid_mime already plucks the msgid out of the angle brackets.
2020-04-02README: add a missing "be"
2020-04-01README: expand on the GUI non-requirement
It may not be immediately obvious why we should value text-based stuff so much, so clarify that.
2020-04-01doc: update notes and HACKING ahead of 1.4 release
There will probably be a 1.4 release in a few days...