user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 0/2] viewdiff: linkification fixes
@ 2020-05-06 10:40 Eric Wong
  2020-05-06 10:40 ` [PATCH 1/2] viewdiff: assume diffstat and diff order are identical Eric Wong
  2020-05-06 10:40 ` [PATCH 2/2] viewdiff: stricter highlighting and linkification check Eric Wong
  0 siblings, 2 replies; 3+ messages in thread
From: Eric Wong @ 2020-05-06 10:40 UTC (permalink / raw)
  To: meta

Diffstat linkification of long file names is no longer hash
order dependent, since I noticed some HTML rendering differences
between PublicInbox::MIME and PublicInbox::Eml (its
non-Email::MIME replacement).

I also noticed some wasted work in patch series cover letters
which included diffstats, as well as over-linkifying
tables in the cover letter which feature no other
diff features.

Eric Wong (2):
  viewdiff: assume diffstat and diff order are identical
  viewdiff: stricter highlighting and linkification check

 lib/PublicInbox/View.pm     |  7 +++++--
 lib/PublicInbox/ViewDiff.pm | 27 ++++++++++++---------------
 2 files changed, 17 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] viewdiff: assume diffstat and diff order are identical
  2020-05-06 10:40 [PATCH 0/2] viewdiff: linkification fixes Eric Wong
@ 2020-05-06 10:40 ` Eric Wong
  2020-05-06 10:40 ` [PATCH 2/2] viewdiff: stricter highlighting and linkification check Eric Wong
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Wong @ 2020-05-06 10:40 UTC (permalink / raw)
  To: meta

For non-malicious messages, we can assume the diffstat and actual
diff appear in the same order.  Thus we can store {-long_paths} as
an arrayref and only compare the first element when we encounter
a truncated path.

This should make HTML rendering stable when there's basename
conflicts in message such as
https://lore.kernel.org/backports/1393202754-12919-13-git-send-email-hauke@hauke-m.de/

This diffstat anchor linkification can still be defeated by
users who make actual path names beginning with "...", but we
won't waste CPU cycles on it, either.
---
 lib/PublicInbox/ViewDiff.pm | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index 3d6058a9..34df8ad4 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -82,10 +82,8 @@ sub anchor0 ($$$$) {
 	$fn =~ s/{(?:.+) => (.+)}/$1/ or $fn =~ s/.* => (.+)/$1/;
 	$fn = git_unquote($fn);
 
-	# long filenames will require us to walk backwards in anchor1
-	if ($fn =~ s!\A\.\.\./?!!) {
-		$ctx->{-long_path}->{$fn} = qr/\Q$fn\E\z/s;
-	}
+	# long filenames will require us to check in anchor1()
+	push(@{$ctx->{-long_path}}, $fn) if $fn =~ s!\A\.\.\./?!!;
 
 	if (my $attr = to_attr($ctx->{-apfx}.$fn)) {
 		$ctx->{-anchors}->{$attr} = 1;
@@ -105,17 +103,14 @@ sub anchor1 ($$) {
 
 	my $ok = delete $ctx->{-anchors}->{$attr};
 
-	# unlikely, check the end of all long path names we captured:
+	# unlikely, check the end of long path names we captured,
+	# assume diffstat and diff output follow the same order,
+	# and ignore different ordering (could be malicious input)
 	unless ($ok) {
-		my $lp = $ctx->{-long_path} or return;
-		foreach my $fn (keys %$lp) {
-			$pb =~ $lp->{$fn} or next;
-
-			delete $lp->{$fn};
-			$attr = to_attr($ctx->{-apfx}.$fn) or return;
-			$ok = delete $ctx->{-anchors}->{$attr} or return;
-			last;
-		}
+		my $fn = shift(@{$ctx->{-long_path}}) or return;
+		$pb =~ /\Q$fn\E\z/s or return;
+		$attr = to_attr($ctx->{-apfx}.$fn) or return;
+		$ok = delete $ctx->{-anchors}->{$attr} or return;
 	}
 	$ok ? "<a\nhref=#i$attr\nid=$attr>diff</a> --git" : undef
 }

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 2/2] viewdiff: stricter highlighting and linkification check
  2020-05-06 10:40 [PATCH 0/2] viewdiff: linkification fixes Eric Wong
  2020-05-06 10:40 ` [PATCH 1/2] viewdiff: assume diffstat and diff order are identical Eric Wong
@ 2020-05-06 10:40 ` Eric Wong
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Wong @ 2020-05-06 10:40 UTC (permalink / raw)
  To: meta

Sometimes senders draw ASCII tables and such which we
get fooled into attempting highlighting and diffstat
anchoring.

We now require 3 consecutive diff header lines:

	/^--- /, /^\Q+++\E /, and /^@@ /

to enable diff highlighting (whether generated with git or not).
The presence of a line matching /^diff / is not sufficient or
even useful to us for highlighting diffs, since that could just
be part of a line-wrapped sentence.

However, we'll now check for the presence of a line matching
/^diff --git / before enabling diffstat anchors.  Otherwise
cover letters for a patch series may fool us into creating
anchors for diffstats.
---
 lib/PublicInbox/View.pm     | 7 +++++--
 lib/PublicInbox/ViewDiff.pm | 4 +++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 5144a130..f7a8ae32 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -536,11 +536,14 @@ sub add_text_body { # callback for msg_iter
 	# always support diff-highlighting, but we can't linkify hunk
 	# headers for solver unless some coderepo are configured:
 	my $diff;
-	if ($s =~ /^(?:diff|---|\+{3}) /ms) {
+	if ($s =~ /^--- [^\n]+\n\+{3} [^\n]+\n@@ /ms) {
 		# diffstat anchors do not link across attachments or messages:
 		$idx[0] = $upfx . $idx[0] if $upfx ne '';
 		$ctx->{-apfx} = join('/', @idx);
-		$ctx->{-anchors} = {}; # attr => filename
+
+		# do attr => filename mappings for diffstats in git diffs:
+		$ctx->{-anchors} = {} if $s =~ /^diff --git /sm;
+
 		$diff = 1;
 		delete $ctx->{-long_path};
 		my $spfx;
diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index 34df8ad4..6fe9a0d7 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -165,10 +165,12 @@ sub diff_before_or_after ($$) {
 	my ($ctx, $x) = @_;
 	my $linkify = $ctx->{-linkify};
 	my $dst = $ctx->{obuf};
+	my $anchors = exists($ctx->{-anchors}) ? 1 : 0;
 	for my $y (split(/(^---\n)/sm, $$x)) {
 		if ($y =~ /\A---\n\z/s) {
 			$$dst .= "---\n"; # all HTML is "\r\n" => "\n"
-		} elsif ($y =~ /^ [0-9]+ files? changed, /sm) {
+			$anchors |= 2;
+		} elsif ($anchors == 3 && $y =~ /^ [0-9]+ files? changed, /sm) {
 			# ok, looks like a diffstat, go line-by-line:
 			for my $l (split(/^/m, $y)) {
 				if ($l =~ /^ (.+)( +\| .*\z)/s) {

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-06 10:40 [PATCH 0/2] viewdiff: linkification fixes Eric Wong
2020-05-06 10:40 ` [PATCH 1/2] viewdiff: assume diffstat and diff order are identical Eric Wong
2020-05-06 10:40 ` [PATCH 2/2] viewdiff: stricter highlighting and linkification check Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git