From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_LOW, SPF_HELO_PASS,SPF_PASS,URIBL_SBL,URIBL_SBL_A shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 8A3511F670 for ; Sun, 24 Oct 2021 09:57:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230300AbhJXJ7H (ORCPT ); Sun, 24 Oct 2021 05:59:07 -0400 Received: from bsmtp.bon.at ([213.33.87.14]:22874 "EHLO bsmtp.bon.at" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229868AbhJXJ7G (ORCPT ); Sun, 24 Oct 2021 05:59:06 -0400 Received: from [192.168.0.98] (unknown [93.83.142.38]) by bsmtp.bon.at (Postfix) with ESMTPSA id 4HcYQN2BlPz5tlD; Sun, 24 Oct 2021 11:56:44 +0200 (CEST) Subject: [PATCH 7/6] userdiff-cpp: back out the digit-separators in numbers To: git@vger.kernel.org Cc: =?UTF-8?B?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= , Johannes Sixt via GitGitGadget References: From: Johannes Sixt Message-ID: Date: Sun, 24 Oct 2021 11:56:43 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The implementation of digit-separating single-quotes introduced a note-worthy regression: the change of a character literal with a digit would splice the digit and the closing single-quote. For example, the change from 'a' to '2' is now tokenized as '[-a'-]{+2'+} instead of '[-a-]{+2+}'. The options to fix the regression are: - Tighten the regular expression such that the single-quote can only occur between digits (that would match the official syntax). - Remove support for digit separators. I chose to remove support, because - I have not seen a lot of code make use of digit separators. - If code does use digit separators, then the numbers are typically long. If a change in one of the segments occurs, it is actually better visible if only that segment is highlighted as the word that changed instead of the whole long number. This choice does introduce another minor regression, though, which is highlighted in the test case: when a change occurs in the second or later segment of a hexadecimal number where the segment begins with a digit, but also has letters, the segment is mistaken as consisting of a number and an identifier. I can live with that. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 12 ++++++------ t/t4034/cpp/post | 10 +++++----- t/t4034/cpp/pre | 8 ++++---- userdiff.c | 6 +++--- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 5ff4ce477b..dc500ae092 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -1,21 +1,21 @@ diff --git a/pre b/post -index 144cd98..64e78af 100644 +index a1a09b7..f1b6f3c 100644 --- a/pre +++ b/post @@ -1,30 +1,30 @@ Foo() : x(0&&1&42) { foo0bar(x.findFind); } cout<<"Hello World!?\n"<(1 -+1e10 0xabcdef) 'x.' +(1 -+1e10 0xabcdef) 'x2' // long double -3.141'592'653e-10l3.141'592'654e+10l +3.141592653e-10l3.141592654e+10l // float 120E5f120E6f // hex -0xdead'beaf0xdead'Beaf+8ULL7ULL +0xdead0xdeaf'1eaFeaf+8ULL7ULL // octal -0123'45670123'4560 +0123456701234560 // binary -0b10'000b11'00+e1 +0b10000b1100+e1 // expression 1.5-e+23+f // another one diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post index 64e78afbfb..f1b6f3c228 100644 --- a/t/t4034/cpp/post +++ b/t/t4034/cpp/post @@ -1,16 +1,16 @@ Foo() : x(0&42) { bar(x.Find); } cout<<"Hello World?\n"<%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*|<=>"), PATTERNS("csharp", /* Keywords */ -- 2.33.0.129.g739793498e