git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Johannes Sixt via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Sixt" <j6t@kdbg.org>
Subject: [PATCH v3 0/6] Fun with cpp word regex
Date: Sun, 10 Oct 2021 17:02:58 +0000	[thread overview]
Message-ID: <pull.1054.v3.git.1633885384.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1054.v2.git.1633720197.gitgitgadget@gmail.com>

The cpp word regex driver is a bit too loose and can match too much text
where the intent is to match only a number.

The first patch makes the cpp word regex tests more effective.

The second patch adds problematic test cases. The third patch fixes these
problems.

The remaining three patches add support for digit separators and the
spaceship operator <=> (generalized comparison operator).

I left out support for hexadecimal floating point constants because that
would require to tighten the regex even more to avoid that entire
expressions are treated as single tokens.

Changes since V2:

 * Add test cases for the new features in a separate commit so that the new
   behavior is better visible.
 * Don't treat .' as in '.' as a token.

Changes since V1:

 * Tests, tests, tests.
 * Polished commit messages.

Johannes Sixt (6):
  t4034/cpp: actually test that operator tokens are not split
  t4034: add tests showing problematic cpp tokenizations
  userdiff-cpp: tighten word regex
  userdiff-cpp: prepare test cases with yet unsupported features
  userdiff-cpp: permit the digit-separating single-quote in numbers
  userdiff-cpp: learn the C++ spaceship operator

 t/t4034/cpp/expect | 63 +++++++++++++++++++++++-----------------------
 t/t4034/cpp/post   | 47 +++++++++++++++++++++-------------
 t/t4034/cpp/pre    | 41 +++++++++++++++++++-----------
 userdiff.c         | 10 ++++++--
 4 files changed, 94 insertions(+), 67 deletions(-)


base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1054%2Fj6t%2Ffun-with-cpp-word-regex-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1054/j6t/fun-with-cpp-word-regex-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1054

Range-diff vs v2:

 1:  dd9f82ba712 = 1:  dd9f82ba712 t4034/cpp: actually test that operator tokens are not split
 2:  5a84fc9cf71 = 2:  5a84fc9cf71 t4034: add tests showing problematic cpp tokenizations
 3:  d4ebe45fddc = 3:  d4ebe45fddc userdiff-cpp: tighten word regex
 4:  dd75d19cee9 ! 4:  c9f58b5e82f userdiff-cpp: permit the digit-separating single-quote in numbers
     @@ Metadata
      Author: Johannes Sixt <j6t@kdbg.org>
      
       ## Commit message ##
     -    userdiff-cpp: permit the digit-separating single-quote in numbers
     +    userdiff-cpp: prepare test cases with yet unsupported features
      
     -    Since C++17, the single-quote can be used as digit separator:
     -
     -       3.141'592'654
     -       1'000'000
     -       0xdead'beaf
     -
     -    Make it known to the word regex of the cpp driver, so that numbers are
     -    not split into separate tokens at the single-quotes.
     +    We are going to add support for C++'s digit-separating single-quote and
     +    the spaceship operator. By adding the test cases in this separate
     +    commit, the effect on the word highlighting will become more obvious
     +    as the features are implemented and the file cpp/expect is updated.
      
          Signed-off-by: Johannes Sixt <j6t@kdbg.org>
      
     @@ t/t4034/cpp/expect
      @@
       <BOLD>diff --git a/pre b/post<RESET>
      -<BOLD>index 1229cdb..3feae6f 100644<RESET>
     -+<BOLD>index 60f3640..f6fbf7b 100644<RESET>
     ++<BOLD>index 144cd98..64e78af 100644<RESET>
       <BOLD>--- a/pre<RESET>
       <BOLD>+++ b/post<RESET>
       <CYAN>@@ -1,30 +1,30 @@<RESET>
     -@@ t/t4034/cpp/expect: Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <RED>foo0<RESET><GREEN>bar<RESET>
     + Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <RED>foo0<RESET><GREEN>bar<RESET>(x.<RED>find<RESET><GREEN>Find<RESET>); }
       cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
     - <GREEN>(<RESET>1 <RED>-<RESET><GREEN>+<RESET>1e10 0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
     +-<GREEN>(<RESET>1 <RED>-<RESET><GREEN>+<RESET>1e10 0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
     ++<GREEN>(<RESET>1 <RED>-<RESET><GREEN>+<RESET>1e10 0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>.<RESET>'
       // long double<RESET>
      -<RED>3.141592653e-10l<RESET><GREEN>3.141592654e+10l<RESET>
     -+<RED>3.141'592'653e-10l<RESET><GREEN>3.141'592'654e+10l<RESET>
     ++3.141'592'<RED>653e-10l<RESET><GREEN>654e+10l<RESET>
       // float<RESET>
       <RED>120E5f<RESET><GREEN>120E6f<RESET>
       // hex<RESET>
      -<RED>0xdeadbeaf<RESET><GREEN>0xdeadBeaf<RESET>+<RED>8ULL<RESET><GREEN>7ULL<RESET>
     -+<RED>0xdead'beaf<RESET><GREEN>0xdead'Beaf<RESET>+<RED>8ULL<RESET><GREEN>7ULL<RESET>
     ++0xdead'<RED>beaf<RESET><GREEN>Beaf<RESET>+<RED>8ULL<RESET><GREEN>7ULL<RESET>
       // octal<RESET>
      -<RED>01234567<RESET><GREEN>01234560<RESET>
     -+<RED>0123'4567<RESET><GREEN>0123'4560<RESET>
     ++0123'<RED>4567<RESET><GREEN>4560<RESET>
       // binary<RESET>
      -<RED>0b1000<RESET><GREEN>0b1100<RESET>+e1
     -+<RED>0b10'00<RESET><GREEN>0b11'00<RESET>+e1
     ++<RED>0b10<RESET><GREEN>0b11<RESET>'00+e1
       // expression<RESET>
       1.5-e+<RED>2<RESET><GREEN>3<RESET>+f
       // another one<RESET>
     +@@ t/t4034/cpp/expect: str.e+<RED>65<RESET><GREEN>75<RESET>
     + a<RED>*<RESET><GREEN>*=<RESET>b c<RED>/<RESET><GREEN>/=<RESET>d e<RED>%<RESET><GREEN>%=<RESET>f
     + a<RED>+<RESET><GREEN>++<RESET>b c<RED>-<RESET><GREEN>--<RESET>d
     + a<RED><<<RESET><GREEN><<=<RESET>b c<RED>>><RESET><GREEN>>>=<RESET>d
     +-a<RED><<RESET><GREEN><=<RESET>b c<RED><=<RESET><GREEN><<RESET>d e<RED>><RESET><GREEN>>=<RESET>f g<RED>>=<RESET><GREEN>><RESET>h
     ++a<RED><<RESET><GREEN><=<RESET>b c<RED><=<RESET><GREEN><<RESET>d e<RED>><RESET><GREEN>>=<RESET>f g<RED>>=<RESET><GREEN>><RESET>h i<=<GREEN>><RESET>j
     + a<RED>==<RESET><GREEN>!=<RESET>b c<RED>!=<RESET><GREEN>=<RESET>d
     + a<RED>^<RESET><GREEN>^=<RESET>b c<RED>|<RESET><GREEN>|=<RESET>d e<RED>&&<RESET><GREEN>&=<RESET>f
     + a<RED>||<RESET><GREEN>|<RESET>b
      
       ## t/t4034/cpp/post ##
     -@@ t/t4034/cpp/post: Foo() : x(0&42) { bar(x.Find); }
     +@@
     + Foo() : x(0&42) { bar(x.Find); }
       cout<<"Hello World?\n"<<endl;
     - (1 +1e10 0xabcdef) 'y'
     +-(1 +1e10 0xabcdef) 'y'
     ++(1 +1e10 0xabcdef) '.'
       // long double
      -3.141592654e+10l
      +3.141'592'654e+10l
     @@ t/t4034/cpp/post: Foo() : x(0&42) { bar(x.Find); }
       // expression
       1.5-e+3+f
       // another one
     +@@ t/t4034/cpp/post: str.e+75
     + a*=b c/=d e%=f
     + a++b c--d
     + a<<=b c>>=d
     +-a<=b c<d e>=f g>h
     ++a<=b c<d e>=f g>h i<=>j
     + a!=b c=d
     + a^=b c|=d e&=f
     + a|b
      
       ## t/t4034/cpp/pre ##
      @@ t/t4034/cpp/pre: Foo():x(0&&1){ foo0( x.find); }
     @@ t/t4034/cpp/pre: Foo():x(0&&1){ foo0( x.find); }
       // expression
       1.5-e+2+f
       // another one
     -
     - ## userdiff.c ##
     -@@ userdiff.c: PATTERNS("cpp",
     - 	 /* identifiers and keywords */
     - 	 "[a-zA-Z_][a-zA-Z0-9_]*"
     - 	 /* decimal and octal integers as well as floatingpoint numbers */
     --	 "|[0-9][0-9.]*([Ee][-+]?[0-9]+)?[fFlLuU]*"
     -+	 "|[0-9][0-9.']*([Ee][-+]?[0-9]+)?[fFlLuU]*"
     - 	 /* hexadecimal and binary integers */
     --	 "|0[xXbB][0-9a-fA-F]+[lLuU]*"
     -+	 "|0[xXbB][0-9a-fA-F']+[lLuU]*"
     - 	 /* floatingpoint numbers that begin with a decimal point */
     --	 "|\\.[0-9]+([Ee][-+]?[0-9]+)?[fFlL]?"
     -+	 "|\\.[0-9']+([Ee][-+]?[0-9]+)?[fFlL]?"
     - 	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"),
     - PATTERNS("csharp",
     - 	 /* Keywords */
     +@@ t/t4034/cpp/pre: str.e+65
     + a*b c/d e%f
     + a+b c-d
     + a<<b c>>d
     +-a<b c<=d e>f g>=h
     ++a<b c<=d e>f g>=h i<=j
     + a==b c!=d
     + a^b c|d e&&f
     + a||b
 -:  ----------- > 5:  037c743d9e3 userdiff-cpp: permit the digit-separating single-quote in numbers
 5:  43a701f5ffd ! 6:  cc9dc967f10 userdiff-cpp: learn the C++ spaceship operator
     @@ Commit message
          Signed-off-by: Johannes Sixt <j6t@kdbg.org>
      
       ## t/t4034/cpp/expect ##
     -@@
     - <BOLD>diff --git a/pre b/post<RESET>
     --<BOLD>index 60f3640..f6fbf7b 100644<RESET>
     -+<BOLD>index 144cd98..244f79c 100644<RESET>
     - <BOLD>--- a/pre<RESET>
     - <BOLD>+++ b/post<RESET>
     - <CYAN>@@ -1,30 +1,30 @@<RESET>
      @@ t/t4034/cpp/expect: str.e+<RED>65<RESET><GREEN>75<RESET>
       a<RED>*<RESET><GREEN>*=<RESET>b c<RED>/<RESET><GREEN>/=<RESET>d e<RED>%<RESET><GREEN>%=<RESET>f
       a<RED>+<RESET><GREEN>++<RESET>b c<RED>-<RESET><GREEN>--<RESET>d
       a<RED><<<RESET><GREEN><<=<RESET>b c<RED>>><RESET><GREEN>>>=<RESET>d
     --a<RED><<RESET><GREEN><=<RESET>b c<RED><=<RESET><GREEN><<RESET>d e<RED>><RESET><GREEN>>=<RESET>f g<RED>>=<RESET><GREEN>><RESET>h
     +-a<RED><<RESET><GREEN><=<RESET>b c<RED><=<RESET><GREEN><<RESET>d e<RED>><RESET><GREEN>>=<RESET>f g<RED>>=<RESET><GREEN>><RESET>h i<=<GREEN>><RESET>j
      +a<RED><<RESET><GREEN><=<RESET>b c<RED><=<RESET><GREEN><<RESET>d e<RED>><RESET><GREEN>>=<RESET>f g<RED>>=<RESET><GREEN>><RESET>h i<RED><=<RESET><GREEN><=><RESET>j
       a<RED>==<RESET><GREEN>!=<RESET>b c<RED>!=<RESET><GREEN>=<RESET>d
       a<RED>^<RESET><GREEN>^=<RESET>b c<RED>|<RESET><GREEN>|=<RESET>d e<RED>&&<RESET><GREEN>&=<RESET>f
       a<RED>||<RESET><GREEN>|<RESET>b
      
     - ## t/t4034/cpp/post ##
     -@@ t/t4034/cpp/post: str.e+75
     - a*=b c/=d e%=f
     - a++b c--d
     - a<<=b c>>=d
     --a<=b c<d e>=f g>h
     -+a<=b c<d e>=f g>h i<=>j
     - a!=b c=d
     - a^=b c|=d e&=f
     - a|b
     -
     - ## t/t4034/cpp/pre ##
     -@@ t/t4034/cpp/pre: str.e+65
     - a*b c/d e%f
     - a+b c-d
     - a<<b c>>d
     --a<b c<=d e>f g>=h
     -+a<b c<=d e>f g>=h i<=j
     - a==b c!=d
     - a^b c|d e&&f
     - a||b
     -
       ## userdiff.c ##
      @@ userdiff.c: PATTERNS("cpp",
       	 "|0[xXbB][0-9a-fA-F']+[lLuU]*"
       	 /* floatingpoint numbers that begin with a decimal point */
     - 	 "|\\.[0-9']+([Ee][-+]?[0-9]+)?[fFlL]?"
     + 	 "|\\.[0-9][0-9']*([Ee][-+]?[0-9]+)?[fFlL]?"
      -	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"),
      +	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*|<=>"),
       PATTERNS("csharp",

-- 
gitgitgadget

  parent reply	other threads:[~2021-10-10 17:03 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07  6:50 [PATCH 0/3] Fun with cpp word regex Johannes Sixt via GitGitGadget
2021-10-07  6:50 ` [PATCH 1/3] userdiff: tighten " Johannes Sixt via GitGitGadget
2021-10-07  6:50 ` [PATCH 2/3] userdiff: permit the digit-separating single-quote in numbers Johannes Sixt via GitGitGadget
2021-10-07  6:51 ` [PATCH 3/3] userdiff: learn the C++ spaceship operator Johannes Sixt via GitGitGadget
2021-10-07  9:14 ` [PATCH 0/3] Fun with cpp word regex Ævar Arnfjörð Bjarmason
2021-10-07 16:40   ` Johannes Sixt
2021-10-08 19:09 ` [PATCH v2 0/5] " Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 1/5] t4034/cpp: actually test that operator tokens are not split Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 2/5] t4034: add tests showing problematic cpp tokenizations Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 3/5] userdiff-cpp: tighten word regex Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 4/5] userdiff-cpp: permit the digit-separating single-quote in numbers Johannes Sixt via GitGitGadget
2021-10-08 19:09   ` [PATCH v2 5/5] userdiff-cpp: learn the C++ spaceship operator Johannes Sixt via GitGitGadget
2021-10-08 20:07   ` [PATCH v2 0/5] Fun with cpp word regex Ævar Arnfjörð Bjarmason
2021-10-08 22:11     ` Johannes Sixt
2021-10-09  0:00       ` Ævar Arnfjörð Bjarmason
2021-10-10 20:15         ` Johannes Sixt
2021-10-10 17:02   ` Johannes Sixt via GitGitGadget [this message]
2021-10-10 17:02     ` [PATCH v3 1/6] t4034/cpp: actually test that operator tokens are not split Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 2/6] t4034: add tests showing problematic cpp tokenizations Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 3/6] userdiff-cpp: tighten word regex Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 4/6] userdiff-cpp: prepare test cases with yet unsupported features Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 5/6] userdiff-cpp: permit the digit-separating single-quote in numbers Johannes Sixt via GitGitGadget
2021-10-10 17:03     ` [PATCH v3 6/6] userdiff-cpp: learn the C++ spaceship operator Johannes Sixt via GitGitGadget
2021-10-24  9:56     ` [PATCH 7/6] userdiff-cpp: back out the digit-separators in numbers Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1054.v3.git.1633885384.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=j6t@kdbg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).