about summary refs log tree commit homepage
path: root/lib/PublicInbox/ViewDiff.pm
DateCommit message (Collapse)
2023-10-09www_coderepo: fix handling of non-UTF-8 git data
We can't assume git output is UTF-8, and we'll always have legacy data in git coderepos. So attempt to display some some garbled text rather than nothing at all if Perl croaks on it. sox commit c38987e8d20505621b8d872863afa7d233ed1096 (Added raw inverse-bit u-law and A-law support. Updated *.txt files., 2001-12-13) is an example of a commit which caused problems for me.
2022-10-02viewdiff: fix parts of diff being appended after signature
I'm not sure what kind of brain fart introduced this in c1e7a048be9d32cd, but it happened :x. We'll undef the $x variable ASAP to save memory and make future errors like this one more noticeable. Fixes: c1e7a048be9d ("www: viewdiff: fix UTF-8 names inside mbox attachments")
2022-09-26viewdiff: save memory by eliminating two captures
Avoid relying on $DIGIT captures when @- and @+ to access last match start and end, respectively. The elimination of the post capture ought to allow the use of sv_chop to advance the string start pointer without memory copies. This ought to save 1-2MB of memory on my system since I've noticed the captures was using a big chunk of scratchpad space.
2022-09-12www: viewdiff: fix UTF-8 names inside mbox attachments
This avoids `Wide character in print' warnings and ensures the UTF-8 characters in `Signed-off-by' trailers are properly rendered in HTML even when attempting to decode and display application/octet-stream mbox attachments as HTML. Linkification and reconstruction for coderepos is probably still broken, but that is a much bigger task to fix, I think. Fixes: ab9c03ff4aa369b3 ("www: use PerlIO::scalar (zfh) for buffering")
2022-09-10viewdiff: diff_header: shorten function, slightly
It makes for easier reading with gigantic fonts.
2022-09-10viewdiff: diff_before_or_after: avoid extra capture
/(.*?)\z/ will capture the "$X insertions(+), $Y deletions(-)" bit anyways, along with whatever extra notes before the /^diff --git / line. So just rely on /(.*?)\z/ and avoid the special case before it.
2022-09-10www: use PerlIO::scalar (zfh) for buffering
Calling Compress::Raw::Zlib::deflate is fairly expensive. Relying on the `.=' (concat) operator inside ->zadd operator is faster, but the method dispatch overhead is noticeable compared to the original code where we had bare `.=' littered throughout. Fortunately, `print' and `say' with the PerlIO::scalar IO layer appears to offer better performance without high method dispatch overhead. This doesn't allow us to save as much memory as I originally hoped, but does allow us to rely less on concat operators in other places and just pass a list of args to `print' and `say' as a appropriate. This does reduce scratchpad use, however, allowing for large memory savings, and we still ->deflate every single $eml.
2022-09-10www: switch to zadd for the majority of buffering
This allows us to focus string concatenations in one place to allow Perl internal scratchpad optimizations to reuse memory. Calling Compress::Raw::Zlib::deflate repeatedly proves too expensive in terms of CPU cycles.
2022-09-10view: switch a few things to ctx->zmore
Unfortunately, this is actually slower. However, this hopefully makes it easier to improve the internals and make performance improvements down the line.
2022-09-10viewdiff: diff_hunk: shorten conditionals, slightly
I'm not sure if Devel::Size::total_size can be trusted due to the regexps and crashes[1], but when it works, it's showing around a 900 byte size reduction, too. [1] https://rt.cpan.org/Public/Bug/Display.html?id=96421
2022-09-10viewdiff: reuse existing string in diff_before_or_after
Instead of appending to an ever-growing {obuf}, we'll reuse the existing string (which already has pre-allocated memory).
2022-09-10www: viewdiff: use return value for diff_hunk
It's only a short string, so there's not much copy overhead, and it'll make future changes easier to reason about.
2022-08-29viewvcs: add tree view
This also includes some glossary definitions to help users unfamiliar with git understand the relationship between trees and blobs.
2022-08-29www: atom: fix "changed" href to nowhere
The HTML generated for the Atom feed doesn't have the footer of /T/ and /t/ HTML-only views, so just make "changed" in the diffstat go directly to the permalink #related anchor. Fixes: 66512e177390 ("view: generate query in single-message and commit views")
2022-08-23viewdiff: linkify diffstats for non-format-patch emails
Some folks unfortunately use "git diff --stat -p" to generate patches. These messages lack the /^---$/ line and causing diffstats to not get linkified properly. We now treat the /^---$/ as optional and rely on the presence of file lines with / \| / proceeding a /\d+ files? changed,/ line.
2022-08-23view: generate query in single-message and commit views
The dfblob: search prefix is probably under-utilized, but is extremely powerful IMHO. To make it easier-to-use, add a search textarea with it prefilled with values for the existing patch message. This allows users to easily run a query for all patches which alter or result in either pre or post-image blobs in the current patch. Behavior changes are as follows: "changed" in the diffstat jumps to the bottom of the message. For /T/ and /t/, it goes to the "related" anchor which is just above the reply instructions in the single-message view. For the single message view, it'll jump to the textarea search form. I initially wanted to use a normal `<a href=' link, but figured the textarea is advantageous for two reasons: 1) users should be able to edit the query before submitting 2) crawlers are less likely to waste CPU/disk on forms It's probably too noisy to add this directly to the /T/ and /t/ views, but seems like a good place to put above the reply instructions in the single message view. Note that the queries used by the /$COMMIT_OID/s/ view is subtly different than the /$MSGID/ view since git will lengthen its abbreviations over time, while emails are immutable. I tried adding dfn: (filename) and s: (subject) support, but couldn't come up with cases where it really made sense for /$MSGID/. /$COMMIT_OID/s/ may benefit from it, since patchid: could be flaky due to non-standard diff generation options.
2022-04-01viewdiff: use defined checks in more places
It's less cognitive overhead for future readers since I just looked at it again and thought it was possible for "0" to be returned (it isn't).
2021-05-28viewdiff: escape '{' and '}' for regexp
Perl 5 doesn't warn on this, yet, but it warns on unescaped '(' and ')' nowadays, so it's conceivable Perl could start warning on this in the future. So future-proof our code and reduce reader confusion.
2021-05-28viewdiff: make $UNSAFE a variable
There's no sense in using a constant here since it gets copied into the uri_escape_utf8 function anyways. Furthermore, inlined constants still leave behind a subroutine and subs cost several KB of memory. Finally, add a comment as to why it's different than the default escape, since I just spent a minute wondering that.
2021-04-28view_diff: minor coding style fixes
Prefer "use v5.10", s/base/parent/, rely on "perl -w" for warnings. We also pass a regexp to the split perlop rather than literal SV, since split() will compile a new RE every time.
2021-04-27lei q + lcat: support --format=text output
This is mainly for "lei lcat" where it's the default, but I find it useful anyways compared to the JSON view. Colors are loaded from ~/.config/lei/config, and fall back to using diff colors from a normal git config (e.g. ~/.gitconfig).
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-09-16treewide: relax allow >=40 chars for git OID
This will help with eventual git SHA-256 transitions.
2020-05-09viewdiff: don't increment the reported hunk line number
For a diff hunk starting at line N, diff_hunk() constructs the link with "#n(N + 1)". This sends the viewer one line below the first context line. Although this is minor and may not even be noticed, there's not an obvious reason to increment the line number, so switch to using the reported value as is.
2020-05-07viewdiff: stricter highlighting and linkification check
Sometimes senders draw ASCII tables and such which we get fooled into attempting highlighting and diffstat anchoring. We now require 3 consecutive diff header lines: /^--- /, /^\Q+++\E /, and /^@@ / to enable diff highlighting (whether generated with git or not). The presence of a line matching /^diff / is not sufficient or even useful to us for highlighting diffs, since that could just be part of a line-wrapped sentence. However, we'll now check for the presence of a line matching /^diff --git / before enabling diffstat anchors. Otherwise cover letters for a patch series may fool us into creating anchors for diffstats.
2020-05-07viewdiff: assume diffstat and diff order are identical
For non-malicious messages, we can assume the diffstat and actual diff appear in the same order. Thus we can store {-long_paths} as an arrayref and only compare the first element when we encounter a truncated path. This should make HTML rendering stable when there's basename conflicts in message such as https://lore.kernel.org/backports/1393202754-12919-13-git-send-email-hauke@hauke-m.de/ This diffstat anchor linkification can still be defeated by users who make actual path names beginning with "...", but we won't waste CPU cycles on it, either.
2020-04-05release large (non ref) scalars using `undef $sv'
Using `undef EXPR' like a function call actually frees the heap memory associated with the scalar, whereas `$sv = undef' or `$sv = ""' will hold the buffer around until $sv goes out of scope. The `sv_set_undef' documentation in the perlapi(1) manpage explicitly states this: The perl equivalent is "$sv = undef;". Note that it doesn't free any string buffer, unlike "undef $sv". And I've confirmed by reading Dump() output from Devel::Peek. We'll also inline the old index_body sub in SearchIdx.pm to make the scope of the scalar more obvious. This change saves several hundred kB RSS on both -index and -httpd when hitting large emails with thousands of lines.
2020-04-04viewdiff: reduce sub parameter count
We're slowly moving towards doing all of our output buffering into a single buffer, so passing that around on the stack as a dedicated parameter is confusing.
2020-04-03quiet "Complex regular subexpression recursion limit" warnings
These seem mostly harmless since Perl will just truncate the match and start a new one on a newline boundary in our case. The only downside is we'd end up with redundant <span> tags in HTML. Limiting the number of line matched ourselves with `{1,$NUM}' doesn't seem prudent since lines vary in length, so we continue to defer the job of limiting matches to the Perl regexp engine. I've noticed this warning in practice on 100K+ line patches to locale data.
2020-03-20viewdiff: favor `qr' to precompile regexps
We can also avoid `o' regexp modifier, since it isn't recommended by Perl upstream, anymore (although we don't have any bugs or unintended behavior because of it).
2020-03-20www: avoid `state' usage to perform allocations up-front
We want WWW->preload to get as many immortal allocations done as possible, and the `state' feature from Perl 5.10 prevents that.
2020-02-24viewdiff: remove optional CR handling
The only caller of `flush_diff' is `add_text_body', and that already did CRLF conversion on the text part. The regexps in SolverGit still need to preserve CR, however, since that actually applies patches (instead of rendering them), and we need to preserve CRLF patches for CRLF files.
2020-02-17viewdiff: do not generate "a=" parameter if "b=" matches
Long URLs waste bandwidth and redundant query parameters make caching more difficult and expensive. Fixes: ddec19694cbf0e1d ("viewdiff: rewrite and simplify")
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27viewdiff: rewrite and simplify
Instead of going line-by-line, use split() with a giant regexp to capture groups of contiguous lines. This offloads state management to the regexp itself and makes it FAR easier to keep track of <span> and </span> pairings. Performance seems roughly on par after this change for the meta@public-inbox archives. It seems a tiny bit faster for git@vger with xt/perf-msgview.t, likely due to the longer messages and larger contiguous groups of lines having the same prefix (or no prefix at all) and drastically reduces the number of subroutine calls and Perl ops executed.
2020-01-27viewdiff: use autovivification for long_path hash
No sense in wasting code to do something the interpreter already does for us.
2020-01-27viewdiff: add "b=" param when missing "diff --git" line
<2841d2de-32ad-eae8-6039-9251a40bb00e@tngtech.com> as posted to git@vger contained an otherwise valid diff without a "diff --git" line. Generate a "b=" parameter in that case using the "+++" line instead of the "diff --git" line. SearchIdx.pm no longer uses the "diff --git" line for filename information, either.
2020-01-27viewdiff: add "b=" param with non-standard diff prefix
<20180228012207.GB251290@aiede.svl.corp.google.com> (posted to git@vger) uses "i" and "w" prefixes instead of the standard "a" and "b" prefixes, ensure we emit a "b=$FILENAME" param for the solver endpoint to improve search accuracy, syntax highlighting, and information density in the URL itself.
2020-01-27linkify: move to_html over from ViewDiff
We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-06treewide: "require" + "use" cleanup and docs
There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2020-01-04viewdiff: do not anchor spaces after filenames in diffstat
Viewing a CSS-less page in a browser which underlines links can show a long line of underscores after diffstats. Not all browsers underline links by default, though.
2019-07-05viewdiff: do not anchor using diffstat comments
Diffstat summary comments were added to git last year and we need to filter them out to get anchors working properly. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> https://public-inbox.org/meta/20190704231123.GF20404@szeder.dev/
2019-06-04solver|viewdiff: restrict digit matches to ASCII
git would not generate non-ASCII digits to describe hunk offsets, so don't waste more time than necessary to make sense of non-ASCII digit chars for line offsets.
2019-05-31viewdiff: avoid repeat variable expansion
This is worth a 1-2% speedup in t/perf-msgview.t rendering 2620 messages currently in https://public-inbox.org/meta/
2019-05-16Revert "view: perform highlighting for space-prefixed diffs"
This was buggy and was causing non-diff text to have extra leading spaces. The diff parsing code needs to be cleaned up, so this will be fixed, later. This reverts commit 1a67b91c1326efa372d1ec957e2494849d894f0b.
2019-05-16view: perform highlighting for space-prefixed diffs
"git format-patch --interdiff" and similar can prefix diffs with leading white space. Teach our diff parser to account for it and set appropriate CSS classes for them.
2019-04-26viewdiff: do not break out of DSTATE_CTX on /^$/
It seems a common case for mangled patches is editors or MUAs dropping trailing whitespace, and lines matching /^ $/ gets the space dropped to only match /^$/.
2019-04-15viewdiff: document constants
We'll be building off of this for showing diffs in the coderepo views.
2019-02-04viewdiff: group path match to not be confused by "/dev/null"
Leaving out parentheses caused transitions to state="del" or state="add" to be misidentified. cf. https://public-inbox.org/meta/20190204105454.GG10587@szeder.dev/ Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
2019-02-01viewdiff: support renames and long paths in diffstat anchors
This is best-effort, but works well-enough in practice for projects which use shell-friendly filenames as well as the long path names for some Linux kernel selftests.