From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id AB9DE1F8C2 for ; Wed, 6 May 2020 10:40:54 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 1/2] viewdiff: assume diffstat and diff order are identical Date: Wed, 6 May 2020 10:40:53 +0000 Message-Id: <20200506104054.3074-2-e@yhbt.net> In-Reply-To: <20200506104054.3074-1-e@yhbt.net> References: <20200506104054.3074-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: For non-malicious messages, we can assume the diffstat and actual diff appear in the same order. Thus we can store {-long_paths} as an arrayref and only compare the first element when we encounter a truncated path. This should make HTML rendering stable when there's basename conflicts in message such as https://lore.kernel.org/backports/1393202754-12919-13-git-send-email-hauke@hauke-m.de/ This diffstat anchor linkification can still be defeated by users who make actual path names beginning with "...", but we won't waste CPU cycles on it, either. --- lib/PublicInbox/ViewDiff.pm | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm index 3d6058a9..34df8ad4 100644 --- a/lib/PublicInbox/ViewDiff.pm +++ b/lib/PublicInbox/ViewDiff.pm @@ -82,10 +82,8 @@ sub anchor0 ($$$$) { $fn =~ s/{(?:.+) => (.+)}/$1/ or $fn =~ s/.* => (.+)/$1/; $fn = git_unquote($fn); - # long filenames will require us to walk backwards in anchor1 - if ($fn =~ s!\A\.\.\./?!!) { - $ctx->{-long_path}->{$fn} = qr/\Q$fn\E\z/s; - } + # long filenames will require us to check in anchor1() + push(@{$ctx->{-long_path}}, $fn) if $fn =~ s!\A\.\.\./?!!; if (my $attr = to_attr($ctx->{-apfx}.$fn)) { $ctx->{-anchors}->{$attr} = 1; @@ -105,17 +103,14 @@ sub anchor1 ($$) { my $ok = delete $ctx->{-anchors}->{$attr}; - # unlikely, check the end of all long path names we captured: + # unlikely, check the end of long path names we captured, + # assume diffstat and diff output follow the same order, + # and ignore different ordering (could be malicious input) unless ($ok) { - my $lp = $ctx->{-long_path} or return; - foreach my $fn (keys %$lp) { - $pb =~ $lp->{$fn} or next; - - delete $lp->{$fn}; - $attr = to_attr($ctx->{-apfx}.$fn) or return; - $ok = delete $ctx->{-anchors}->{$attr} or return; - last; - } + my $fn = shift(@{$ctx->{-long_path}}) or return; + $pb =~ /\Q$fn\E\z/s or return; + $attr = to_attr($ctx->{-apfx}.$fn) or return; + $ok = delete $ctx->{-anchors}->{$attr} or return; } $ok ? "diff --git" : undef }