Date | Commit message (Collapse) |
|
We can't assume git output is UTF-8, and we'll always have
legacy data in git coderepos. So attempt to display some
some garbled text rather than nothing at all if Perl croaks
on it.
sox commit c38987e8d20505621b8d872863afa7d233ed1096
(Added raw inverse-bit u-law and A-law support. Updated *.txt files., 2001-12-13)
is an example of a commit which caused problems for me.
|
|
I'm not sure what kind of brain fart introduced this in
c1e7a048be9d32cd, but it happened :x. We'll undef the $x
variable ASAP to save memory and make future errors like this
one more noticeable.
Fixes: c1e7a048be9d ("www: viewdiff: fix UTF-8 names inside mbox attachments")
|
|
Avoid relying on $DIGIT captures when @- and @+ to access
last match start and end, respectively. The elimination of
the post capture ought to allow the use of sv_chop to advance
the string start pointer without memory copies.
This ought to save 1-2MB of memory on my system since I've
noticed the captures was using a big chunk of scratchpad
space.
|
|
This avoids `Wide character in print' warnings and ensures the
UTF-8 characters in `Signed-off-by' trailers are properly rendered
in HTML even when attempting to decode and display
application/octet-stream mbox attachments as HTML.
Linkification and reconstruction for coderepos is probably
still broken, but that is a much bigger task to fix, I think.
Fixes: ab9c03ff4aa369b3 ("www: use PerlIO::scalar (zfh) for buffering")
|
|
It makes for easier reading with gigantic fonts.
|
|
/(.*?)\z/ will capture the "$X insertions(+), $Y deletions(-)"
bit anyways, along with whatever extra notes before the
/^diff --git / line. So just rely on /(.*?)\z/ and avoid
the special case before it.
|
|
Calling Compress::Raw::Zlib::deflate is fairly expensive.
Relying on the `.=' (concat) operator inside ->zadd operator is
faster, but the method dispatch overhead is noticeable compared
to the original code where we had bare `.=' littered throughout.
Fortunately, `print' and `say' with the PerlIO::scalar IO layer
appears to offer better performance without high method dispatch
overhead. This doesn't allow us to save as much memory as I
originally hoped, but does allow us to rely less on concat
operators in other places and just pass a list of args to
`print' and `say' as a appropriate.
This does reduce scratchpad use, however, allowing for large
memory savings, and we still ->deflate every single $eml.
|
|
This allows us to focus string concatenations in one place to
allow Perl internal scratchpad optimizations to reuse memory.
Calling Compress::Raw::Zlib::deflate repeatedly proves too
expensive in terms of CPU cycles.
|
|
Unfortunately, this is actually slower. However, this
hopefully makes it easier to improve the internals and
make performance improvements down the line.
|
|
I'm not sure if Devel::Size::total_size can be trusted due
to the regexps and crashes[1], but when it works, it's showing
around a 900 byte size reduction, too.
[1] https://rt.cpan.org/Public/Bug/Display.html?id=96421
|
|
Instead of appending to an ever-growing {obuf}, we'll reuse
the existing string (which already has pre-allocated memory).
|
|
It's only a short string, so there's not much copy overhead,
and it'll make future changes easier to reason about.
|
|
This also includes some glossary definitions to help users
unfamiliar with git understand the relationship between
trees and blobs.
|
|
The HTML generated for the Atom feed doesn't have the footer
of /T/ and /t/ HTML-only views, so just make "changed" in
the diffstat go directly to the permalink #related anchor.
Fixes: 66512e177390 ("view: generate query in single-message and commit views")
|
|
Some folks unfortunately use "git diff --stat -p" to generate
patches. These messages lack the /^---$/ line and causing
diffstats to not get linkified properly. We now treat the
/^---$/ as optional and rely on the presence of file lines with
/ \| / proceeding a /\d+ files? changed,/ line.
|
|
The dfblob: search prefix is probably under-utilized, but is
extremely powerful IMHO. To make it easier-to-use, add a search
textarea with it prefilled with values for the existing patch
message. This allows users to easily run a query for all
patches which alter or result in either pre or post-image
blobs in the current patch.
Behavior changes are as follows: "changed" in the diffstat
jumps to the bottom of the message. For /T/ and /t/, it
goes to the "related" anchor which is just above the reply
instructions in the single-message view. For the single
message view, it'll jump to the textarea search form.
I initially wanted to use a normal `<a href=' link, but
figured the textarea is advantageous for two reasons:
1) users should be able to edit the query before submitting
2) crawlers are less likely to waste CPU/disk on forms
It's probably too noisy to add this directly to the /T/ and /t/
views, but seems like a good place to put above the reply
instructions in the single message view.
Note that the queries used by the /$COMMIT_OID/s/ view is
subtly different than the /$MSGID/ view since git will lengthen
its abbreviations over time, while emails are immutable.
I tried adding dfn: (filename) and s: (subject) support, but
couldn't come up with cases where it really made sense for
/$MSGID/. /$COMMIT_OID/s/ may benefit from it, since patchid:
could be flaky due to non-standard diff generation options.
|
|
It's less cognitive overhead for future readers since I just
looked at it again and thought it was possible for "0" to be returned
(it isn't).
|
|
Perl 5 doesn't warn on this, yet, but it warns on unescaped
'(' and ')' nowadays, so it's conceivable Perl could start
warning on this in the future. So future-proof our code and
reduce reader confusion.
|
|
There's no sense in using a constant here since it
gets copied into the uri_escape_utf8 function anyways.
Furthermore, inlined constants still leave behind a
subroutine and subs cost several KB of memory.
Finally, add a comment as to why it's different than the default
escape, since I just spent a minute wondering that.
|
|
Prefer "use v5.10", s/base/parent/, rely on "perl -w" for warnings.
We also pass a regexp to the split perlop rather than literal
SV, since split() will compile a new RE every time.
|
|
This is mainly for "lei lcat" where it's the default,
but I find it useful anyways compared to the JSON view.
Colors are loaded from ~/.config/lei/config, and fall back
to using diff colors from a normal git config
(e.g. ~/.gitconfig).
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
This will help with eventual git SHA-256 transitions.
|
|
For a diff hunk starting at line N, diff_hunk() constructs the link
with "#n(N + 1)". This sends the viewer one line below the first
context line. Although this is minor and may not even be noticed,
there's not an obvious reason to increment the line number, so switch
to using the reported value as is.
|
|
Sometimes senders draw ASCII tables and such which we
get fooled into attempting highlighting and diffstat
anchoring.
We now require 3 consecutive diff header lines:
/^--- /, /^\Q+++\E /, and /^@@ /
to enable diff highlighting (whether generated with git or not).
The presence of a line matching /^diff / is not sufficient or
even useful to us for highlighting diffs, since that could just
be part of a line-wrapped sentence.
However, we'll now check for the presence of a line matching
/^diff --git / before enabling diffstat anchors. Otherwise
cover letters for a patch series may fool us into creating
anchors for diffstats.
|
|
For non-malicious messages, we can assume the diffstat and actual
diff appear in the same order. Thus we can store {-long_paths} as
an arrayref and only compare the first element when we encounter
a truncated path.
This should make HTML rendering stable when there's basename
conflicts in message such as
https://lore.kernel.org/backports/1393202754-12919-13-git-send-email-hauke@hauke-m.de/
This diffstat anchor linkification can still be defeated by
users who make actual path names beginning with "...", but we
won't waste CPU cycles on it, either.
|
|
Using `undef EXPR' like a function call actually frees the heap
memory associated with the scalar, whereas `$sv = undef' or
`$sv = ""' will hold the buffer around until $sv goes out
of scope.
The `sv_set_undef' documentation in the perlapi(1) manpage
explicitly states this:
The perl equivalent is "$sv = undef;". Note that it doesn't
free any string buffer, unlike "undef $sv".
And I've confirmed by reading Dump() output from Devel::Peek.
We'll also inline the old index_body sub in SearchIdx.pm to make
the scope of the scalar more obvious.
This change saves several hundred kB RSS on both -index and
-httpd when hitting large emails with thousands of lines.
|
|
We're slowly moving towards doing all of our output buffering
into a single buffer, so passing that around on the stack as
a dedicated parameter is confusing.
|
|
These seem mostly harmless since Perl will just truncate the
match and start a new one on a newline boundary in our case.
The only downside is we'd end up with redundant <span> tags in
HTML.
Limiting the number of line matched ourselves with `{1,$NUM}'
doesn't seem prudent since lines vary in length, so we continue
to defer the job of limiting matches to the Perl regexp engine.
I've noticed this warning in practice on 100K+ line patches to
locale data.
|
|
We can also avoid `o' regexp modifier, since it isn't
recommended by Perl upstream, anymore (although we don't
have any bugs or unintended behavior because of it).
|
|
We want WWW->preload to get as many immortal allocations done
as possible, and the `state' feature from Perl 5.10 prevents that.
|
|
The only caller of `flush_diff' is `add_text_body', and that
already did CRLF conversion on the text part. The regexps in
SolverGit still need to preserve CR, however, since that
actually applies patches (instead of rendering them), and we
need to preserve CRLF patches for CRLF files.
|
|
Long URLs waste bandwidth and redundant query parameters
make caching more difficult and expensive.
Fixes: ddec19694cbf0e1d ("viewdiff: rewrite and simplify")
|
|
I didn't wait until September to do it, this year!
|
|
Instead of going line-by-line, use split() with a giant regexp
to capture groups of contiguous lines. This offloads state
management to the regexp itself and makes it FAR easier to
keep track of <span> and </span> pairings.
Performance seems roughly on par after this change for the
meta@public-inbox archives. It seems a tiny bit faster for
git@vger with xt/perf-msgview.t, likely due to the longer
messages and larger contiguous groups of lines having the same
prefix (or no prefix at all) and drastically reduces the number
of subroutine calls and Perl ops executed.
|
|
No sense in wasting code to do something the interpreter
already does for us.
|
|
<2841d2de-32ad-eae8-6039-9251a40bb00e@tngtech.com> as posted to
git@vger contained an otherwise valid diff without a "diff
--git" line. Generate a "b=" parameter in that case using the
"+++" line instead of the "diff --git" line. SearchIdx.pm no
longer uses the "diff --git" line for filename information,
either.
|
|
<20180228012207.GB251290@aiede.svl.corp.google.com> (posted to
git@vger) uses "i" and "w" prefixes instead of the standard "a"
and "b" prefixes, ensure we emit a "b=$FILENAME" param for the
solver endpoint to improve search accuracy, syntax highlighting,
and information density in the URL itself.
|
|
We use the same idiom in many places for doing two-step
linkification and HTML escaping. Get rid of an outdated
comment in flush_quote while we're at it.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
Viewing a CSS-less page in a browser which underlines links
can show a long line of underscores after diffstats. Not all
browsers underline links by default, though.
|
|
Diffstat summary comments were added to git last year and
we need to filter them out to get anchors working properly.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
https://public-inbox.org/meta/20190704231123.GF20404@szeder.dev/
|
|
git would not generate non-ASCII digits to describe
hunk offsets, so don't waste more time than necessary
to make sense of non-ASCII digit chars for line offsets.
|
|
This is worth a 1-2% speedup in t/perf-msgview.t rendering 2620
messages currently in https://public-inbox.org/meta/
|
|
This was buggy and was causing non-diff text to have extra
leading spaces. The diff parsing code needs to be cleaned up,
so this will be fixed, later.
This reverts commit 1a67b91c1326efa372d1ec957e2494849d894f0b.
|
|
"git format-patch --interdiff" and similar can prefix diffs
with leading white space. Teach our diff parser to account
for it and set appropriate CSS classes for them.
|
|
It seems a common case for mangled patches is editors or MUAs
dropping trailing whitespace, and lines matching /^ $/ gets
the space dropped to only match /^$/.
|
|
We'll be building off of this for showing diffs in
the coderepo views.
|
|
Leaving out parentheses caused transitions to state="del" or
state="add" to be misidentified.
cf. https://public-inbox.org/meta/20190204105454.GG10587@szeder.dev/
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
|
|
This is best-effort, but works well-enough in practice for
projects which use shell-friendly filenames as well as the
long path names for some Linux kernel selftests.
|