git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Sixt <j6t@kdbg.org>
To: git@vger.kernel.org
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Sixt via GitGitGadget" <gitgitgadget@gmail.com>
Subject: [PATCH 7/6] userdiff-cpp: back out the digit-separators in numbers
Date: Sun, 24 Oct 2021 11:56:43 +0200	[thread overview]
Message-ID: <a5a05e33-3ad9-e78c-69c0-a466eadeffec@kdbg.org> (raw)
In-Reply-To: <pull.1054.v3.git.1633885384.gitgitgadget@gmail.com>

The implementation of digit-separating single-quotes introduced a
note-worthy regression: the change of a character literal with a
digit would splice the digit and the closing single-quote. For
example, the change from 'a' to '2' is now tokenized as
'[-a'-]{+2'+} instead of '[-a-]{+2+}'.

The options to fix the regression are:

- Tighten the regular expression such that the single-quote can only
  occur between digits (that would match the official syntax).

- Remove support for digit separators.

I chose to remove support, because

- I have not seen a lot of code make use of digit separators.

- If code does use digit separators, then the numbers are typically
  long. If a change in one of the segments occurs, it is actually
  better visible if only that segment is highlighted as the word
  that changed instead of the whole long number.

This choice does introduce another minor regression, though, which
is highlighted in the test case: when a change occurs in the second
or later segment of a hexadecimal number where the segment begins
with a digit, but also has letters, the segment is mistaken as
consisting of a number and an identifier. I can live with that.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
 t/t4034/cpp/expect | 12 ++++++------
 t/t4034/cpp/post   | 10 +++++-----
 t/t4034/cpp/pre    |  8 ++++----
 userdiff.c         |  6 +++---
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect
index 5ff4ce477b..dc500ae092 100644
--- a/t/t4034/cpp/expect
+++ b/t/t4034/cpp/expect
@@ -1,21 +1,21 @@
 <BOLD>diff --git a/pre b/post<RESET>
-<BOLD>index 144cd98..64e78af 100644<RESET>
+<BOLD>index a1a09b7..f1b6f3c 100644<RESET>
 <BOLD>--- a/pre<RESET>
 <BOLD>+++ b/post<RESET>
 <CYAN>@@ -1,30 +1,30 @@<RESET>
 Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <RED>foo0<RESET><GREEN>bar<RESET>(x.<RED>find<RESET><GREEN>Find<RESET>); }
 cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
-<GREEN>(<RESET>1 <RED>-<RESET><GREEN>+<RESET>1e10 0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>.<RESET>'
+<GREEN>(<RESET>1 <RED>-<RESET><GREEN>+<RESET>1e10 0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>2<RESET>'
 // long double<RESET>
-<RED>3.141'592'653e-10l<RESET><GREEN>3.141'592'654e+10l<RESET>
+<RED>3.141592653e-10l<RESET><GREEN>3.141592654e+10l<RESET>
 // float<RESET>
 <RED>120E5f<RESET><GREEN>120E6f<RESET>
 // hex<RESET>
-<RED>0xdead'beaf<RESET><GREEN>0xdead'Beaf<RESET>+<RED>8ULL<RESET><GREEN>7ULL<RESET>
+<RED>0xdead<RESET><GREEN>0xdeaf<RESET>'1<RED>eaF<RESET><GREEN>eaf<RESET>+<RED>8ULL<RESET><GREEN>7ULL<RESET>
 // octal<RESET>
-<RED>0123'4567<RESET><GREEN>0123'4560<RESET>
+<RED>01234567<RESET><GREEN>01234560<RESET>
 // binary<RESET>
-<RED>0b10'00<RESET><GREEN>0b11'00<RESET>+e1
+<RED>0b1000<RESET><GREEN>0b1100<RESET>+e1
 // expression<RESET>
 1.5-e+<RED>2<RESET><GREEN>3<RESET>+f
 // another one<RESET>
diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post
index 64e78afbfb..f1b6f3c228 100644
--- a/t/t4034/cpp/post
+++ b/t/t4034/cpp/post
@@ -1,16 +1,16 @@
 Foo() : x(0&42) { bar(x.Find); }
 cout<<"Hello World?\n"<<endl;
-(1 +1e10 0xabcdef) '.'
+(1 +1e10 0xabcdef) '2'
 // long double
-3.141'592'654e+10l
+3.141592654e+10l
 // float
 120E6f
 // hex
-0xdead'Beaf+7ULL
+0xdeaf'1eaf+7ULL
 // octal
-0123'4560
+01234560
 // binary
-0b11'00+e1
+0b1100+e1
 // expression
 1.5-e+3+f
 // another one
diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre
index 144cd980d6..a1a09b7712 100644
--- a/t/t4034/cpp/pre
+++ b/t/t4034/cpp/pre
@@ -2,15 +2,15 @@ Foo():x(0&&1){ foo0( x.find); }
 cout<<"Hello World!\n"<<endl;
 1 -1e10 0xabcdef 'x'
 // long double
-3.141'592'653e-10l
+3.141592653e-10l
 // float
 120E5f
 // hex
-0xdead'beaf+8ULL
+0xdead'1eaF+8ULL
 // octal
-0123'4567
+01234567
 // binary
-0b10'00+e1
+0b1000+e1
 // expression
 1.5-e+2+f
 // another one
diff --git a/userdiff.c b/userdiff.c
index 7b143ef36b..8578cb0d12 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -67,11 +67,11 @@
 	 /* identifiers and keywords */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 /* decimal and octal integers as well as floatingpoint numbers */
-	 "|[0-9][0-9.']*([Ee][-+]?[0-9]+)?[fFlLuU]*"
+	 "|[0-9][0-9.]*([Ee][-+]?[0-9]+)?[fFlLuU]*"
 	 /* hexadecimal and binary integers */
-	 "|0[xXbB][0-9a-fA-F']+[lLuU]*"
+	 "|0[xXbB][0-9a-fA-F]+[lLuU]*"
 	 /* floatingpoint numbers that begin with a decimal point */
-	 "|\\.[0-9][0-9']*([Ee][-+]?[0-9]+)?[fFlL]?"
+	 "|\\.[0-9][0-9]*([Ee][-+]?[0-9]+)?[fFlL]?"
 	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*|<=>"),
 PATTERNS("csharp",
 	 /* Keywords */
-- 
2.33.0.129.g739793498e

      parent reply	other threads:[~2021-10-24  9:57 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07  6:50 [PATCH 0/3] Fun with cpp word regex Johannes Sixt via GitGitGadget
2021-10-07  6:50 ` [PATCH 1/3] userdiff: tighten " Johannes Sixt via GitGitGadget
2021-10-07  6:50 ` [PATCH 2/3] userdiff: permit the digit-separating single-quote in numbers Johannes Sixt via GitGitGadget
2021-10-07  6:51 ` [PATCH 3/3] userdiff: learn the C++ spaceship operator Johannes Sixt via GitGitGadget
2021-10-07  9:14 ` [PATCH 0/3] Fun with cpp word regex Ævar Arnfjörð Bjarmason
2021-10-07 16:40   ` Johannes Sixt
2021-10-08 19:09 ` [PATCH v2 0/5] " Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 1/5] t4034/cpp: actually test that operator tokens are not split Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 2/5] t4034: add tests showing problematic cpp tokenizations Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 3/5] userdiff-cpp: tighten word regex Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 4/5] userdiff-cpp: permit the digit-separating single-quote in numbers Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 5/5] userdiff-cpp: learn the C++ spaceship operator Johannes Sixt via GitGitGadget
2021-10-08 20:07   ` [PATCH v2 0/5] Fun with cpp word regex Ævar Arnfjörð Bjarmason
2021-10-08 22:11     ` Johannes Sixt
2021-10-09  0:00       ` Ævar Arnfjörð Bjarmason
2021-10-10 20:15         ` Johannes Sixt
2021-10-10 17:02   ` [PATCH v3 0/6] " Johannes Sixt via GitGitGadget
2021-10-10 17:02     ` [PATCH v3 1/6] t4034/cpp: actually test that operator tokens are not split Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 2/6] t4034: add tests showing problematic cpp tokenizations Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 3/6] userdiff-cpp: tighten word regex Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 4/6] userdiff-cpp: prepare test cases with yet unsupported features Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 5/6] userdiff-cpp: permit the digit-separating single-quote in numbers Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 6/6] userdiff-cpp: learn the C++ spaceship operator Johannes Sixt via GitGitGadget
2021-10-24  9:56     ` Johannes Sixt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5a05e33-3ad9-e78c-69c0-a466eadeffec@kdbg.org \
    --to=j6t@kdbg.org \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).