about summary refs log tree commit homepage
path: root/lib/PublicInbox/ViewDiff.pm
DateCommit message (Collapse)
2020-09-16treewide: relax allow >=40 chars for git OID
This will help with eventual git SHA-256 transitions.
2020-05-09viewdiff: don't increment the reported hunk line number
For a diff hunk starting at line N, diff_hunk() constructs the link with "#n(N + 1)". This sends the viewer one line below the first context line. Although this is minor and may not even be noticed, there's not an obvious reason to increment the line number, so switch to using the reported value as is.
2020-05-07viewdiff: stricter highlighting and linkification check
Sometimes senders draw ASCII tables and such which we get fooled into attempting highlighting and diffstat anchoring. We now require 3 consecutive diff header lines: /^--- /, /^\Q+++\E /, and /^@@ / to enable diff highlighting (whether generated with git or not). The presence of a line matching /^diff / is not sufficient or even useful to us for highlighting diffs, since that could just be part of a line-wrapped sentence. However, we'll now check for the presence of a line matching /^diff --git / before enabling diffstat anchors. Otherwise cover letters for a patch series may fool us into creating anchors for diffstats.
2020-05-07viewdiff: assume diffstat and diff order are identical
For non-malicious messages, we can assume the diffstat and actual diff appear in the same order. Thus we can store {-long_paths} as an arrayref and only compare the first element when we encounter a truncated path. This should make HTML rendering stable when there's basename conflicts in message such as https://lore.kernel.org/backports/1393202754-12919-13-git-send-email-hauke@hauke-m.de/ This diffstat anchor linkification can still be defeated by users who make actual path names beginning with "...", but we won't waste CPU cycles on it, either.
2020-04-05release large (non ref) scalars using `undef $sv'
Using `undef EXPR' like a function call actually frees the heap memory associated with the scalar, whereas `$sv = undef' or `$sv = ""' will hold the buffer around until $sv goes out of scope. The `sv_set_undef' documentation in the perlapi(1) manpage explicitly states this: The perl equivalent is "$sv = undef;". Note that it doesn't free any string buffer, unlike "undef $sv". And I've confirmed by reading Dump() output from Devel::Peek. We'll also inline the old index_body sub in SearchIdx.pm to make the scope of the scalar more obvious. This change saves several hundred kB RSS on both -index and -httpd when hitting large emails with thousands of lines.
2020-04-04viewdiff: reduce sub parameter count
We're slowly moving towards doing all of our output buffering into a single buffer, so passing that around on the stack as a dedicated parameter is confusing.
2020-04-03quiet "Complex regular subexpression recursion limit" warnings
These seem mostly harmless since Perl will just truncate the match and start a new one on a newline boundary in our case. The only downside is we'd end up with redundant <span> tags in HTML. Limiting the number of line matched ourselves with `{1,$NUM}' doesn't seem prudent since lines vary in length, so we continue to defer the job of limiting matches to the Perl regexp engine. I've noticed this warning in practice on 100K+ line patches to locale data.
2020-03-20viewdiff: favor `qr' to precompile regexps
We can also avoid `o' regexp modifier, since it isn't recommended by Perl upstream, anymore (although we don't have any bugs or unintended behavior because of it).
2020-03-20www: avoid `state' usage to perform allocations up-front
We want WWW->preload to get as many immortal allocations done as possible, and the `state' feature from Perl 5.10 prevents that.
2020-02-24viewdiff: remove optional CR handling
The only caller of `flush_diff' is `add_text_body', and that already did CRLF conversion on the text part. The regexps in SolverGit still need to preserve CR, however, since that actually applies patches (instead of rendering them), and we need to preserve CRLF patches for CRLF files.
2020-02-17viewdiff: do not generate "a=" parameter if "b=" matches
Long URLs waste bandwidth and redundant query parameters make caching more difficult and expensive. Fixes: ddec19694cbf0e1d ("viewdiff: rewrite and simplify")
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27viewdiff: rewrite and simplify
Instead of going line-by-line, use split() with a giant regexp to capture groups of contiguous lines. This offloads state management to the regexp itself and makes it FAR easier to keep track of <span> and </span> pairings. Performance seems roughly on par after this change for the meta@public-inbox archives. It seems a tiny bit faster for git@vger with xt/perf-msgview.t, likely due to the longer messages and larger contiguous groups of lines having the same prefix (or no prefix at all) and drastically reduces the number of subroutine calls and Perl ops executed.
2020-01-27viewdiff: use autovivification for long_path hash
No sense in wasting code to do something the interpreter already does for us.
2020-01-27viewdiff: add "b=" param when missing "diff --git" line
<2841d2de-32ad-eae8-6039-9251a40bb00e@tngtech.com> as posted to git@vger contained an otherwise valid diff without a "diff --git" line. Generate a "b=" parameter in that case using the "+++" line instead of the "diff --git" line. SearchIdx.pm no longer uses the "diff --git" line for filename information, either.
2020-01-27viewdiff: add "b=" param with non-standard diff prefix
<20180228012207.GB251290@aiede.svl.corp.google.com> (posted to git@vger) uses "i" and "w" prefixes instead of the standard "a" and "b" prefixes, ensure we emit a "b=$FILENAME" param for the solver endpoint to improve search accuracy, syntax highlighting, and information density in the URL itself.
2020-01-27linkify: move to_html over from ViewDiff
We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-06treewide: "require" + "use" cleanup and docs
There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2020-01-04viewdiff: do not anchor spaces after filenames in diffstat
Viewing a CSS-less page in a browser which underlines links can show a long line of underscores after diffstats. Not all browsers underline links by default, though.
2019-07-05viewdiff: do not anchor using diffstat comments
Diffstat summary comments were added to git last year and we need to filter them out to get anchors working properly. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> https://public-inbox.org/meta/20190704231123.GF20404@szeder.dev/
2019-06-04solver|viewdiff: restrict digit matches to ASCII
git would not generate non-ASCII digits to describe hunk offsets, so don't waste more time than necessary to make sense of non-ASCII digit chars for line offsets.
2019-05-31viewdiff: avoid repeat variable expansion
This is worth a 1-2% speedup in t/perf-msgview.t rendering 2620 messages currently in https://public-inbox.org/meta/
2019-05-16Revert "view: perform highlighting for space-prefixed diffs"
This was buggy and was causing non-diff text to have extra leading spaces. The diff parsing code needs to be cleaned up, so this will be fixed, later. This reverts commit 1a67b91c1326efa372d1ec957e2494849d894f0b.
2019-05-16view: perform highlighting for space-prefixed diffs
"git format-patch --interdiff" and similar can prefix diffs with leading white space. Teach our diff parser to account for it and set appropriate CSS classes for them.
2019-04-26viewdiff: do not break out of DSTATE_CTX on /^$/
It seems a common case for mangled patches is editors or MUAs dropping trailing whitespace, and lines matching /^ $/ gets the space dropped to only match /^$/.
2019-04-15viewdiff: document constants
We'll be building off of this for showing diffs in the coderepo views.
2019-02-04viewdiff: group path match to not be confused by "/dev/null"
Leaving out parentheses caused transitions to state="del" or state="add" to be misidentified. cf. https://public-inbox.org/meta/20190204105454.GG10587@szeder.dev/ Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
2019-02-01viewdiff: support renames and long paths in diffstat anchors
This is best-effort, but works well-enough in practice for projects which use shell-friendly filenames as well as the long path names for some Linux kernel selftests.
2019-02-01viewdiff: escape HTML ampersand for renames
For URLs we generate, we need to escape '&' in query parameters for correctness.
2019-02-01view: diffstat anchors for multi-message/attachment views
diffstat <-> ^diff anchors work within the same attachment or message while in HTML views which display multiple messages.
2019-02-01viewdiff: diffstat links to diff anchors
This can be helpful for reviewing larger patches which span across several files on the permalink (/$MESSAGE_ID/) HTML page. More work will be needed to get this working for the /T/ and /t/ pages which show multiple emails, as the filename-based anchors will conflict at the moment.
2019-01-20viewdiff: do not link to 0{7,40} blobs (again)
We must reset diff context when starting a new file; and we must check for all-zeroes object_ids as the post-image correctly.
2019-01-20viewdiff: quote attributes for Atom feed
We still need to use XHTML the Atom feed, and XHTML requires attributes to be quoted, whereas HTML 5 does not.
2019-01-20viewdiff: cleanup state transitions a bit
This makes things less error-prone and allows us to only highlight the "@@ -\S+ \+\S+ @@" part of the hunk header line, without highlighting the function context. This more closely matches the coloring behavior of git-diff(1)
2019-01-20viewdiff: support diff-highlighting w/o coderepo
Having diff highlighting alone is still useful, even if blob-resolution/recreation is too expensive or unfeasible.
2019-01-20view: enforce trailing slash for /$INBOX/$OID/s/ endpoints
As with our use of the trailing slash in $MESSAGE_ID/T/ and '$MESSAGE_ID/t/' endpoints, this for 'wget -r --mirror' compatibility as well as allowing sysadmins to quickly stand up a static directory with "index.html" in it to reduce load.
2019-01-20view: enable naming hints for raw blob downloads
Meaningful names in URLs are nice, and it can make life easier for supporting syntax-highlighting
2019-01-19view: wire up diff and vcs viewers with solver