git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language
@ 2022-03-04 13:08 xing zhi jiang
  2022-03-04 13:08 ` [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages xing zhi jiang
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: xing zhi jiang @ 2022-03-04 13:08 UTC (permalink / raw)
  To: git; +Cc: a97410985new

I have already searched the git public-inbox don't find any active patch about
userdiff build-in driver for JavaScript(there is an unfinished patch about 
three years ago). So I pick this as my GSoC micro project.

Below are typical function patterns from JavaScript, and 
also has an example that matches the corresponding pattern. 
These examples come from many popular JavaScript projects. 
Because I want to make sure the hunk header would work well 
on real-world projects.

Common function's pattern for JavaScript
1. normal function
  `^[\t ]*((export[\t ]+)?((async|get|set)[\t ]+)?function[\t ]*([\t ]*\\*[\t ]*|[\t ]*)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)`
  example: 
  1. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L648
  2. https://github.com/mozilla/pdf.js/blob/ad4b2ce021277ff7cea8ec7e32775c65d74ee673/test/unit/evaluator_spec.js#L40
2. JavaScript variable declaration with a lambda expression
  `^^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*[\t ])[\t ]*=>[\t ]*\\{?)`
   example:
   1. https://github.com/webpack/webpack/blob/2279c5a2105ea1498b83a4854919aefe1a28c553/lib/ChunkGraph.js#L91
   2. https://github.com/webpack/webpack/blob/2279c5a2105ea1498b83a4854919aefe1a28c553/lib/ChunkGraph.js#L122
    
   I found sometimes would define function on this way. But this should only match the top level? Because 
   it may match inside the function, and the below code would match the wrong function[1].

3. exports methods by assigning an anonymous function
  `^[\t ]*(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)`
   example:
   1. https://github.com/webpack/webpack/blob/c181294865dca01b28e6e316636fef5f2aad4eb6/lib/dependencies/DynamicExports.js#L17
   2. https://github.com/ajaxorg/ace/blob/d95725983b363a616c584237013dfd36eaadbba4/lib/ace/lib/dom.js#L37
4. expression about assign function to LHS
  `^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)`
   example:
   1. https://github.com/ajaxorg/ace/blob/94422a4a892495564c56089af85019a8f8f24673/lib/ace/anchor.js#L102
   2. https://github.com/ajaxorg/ace/blob/d95725983b363a616c584237013dfd36eaadbba4/lib/ace/lib/dom.js#L37
   3. https://github.com/ajaxorg/ace/blob/4257621787b4253d6d493611f4ec5a37829da323/lib/ace/search.js#L350
   4. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L299
   
   Maybe this should only match on the 0,1,2 indent level? Because JavaScript may match the function assignment in another function.
5. normal function in object literal
  `^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)`
    1. https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
    2. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L242
6. function in class
  `^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)`
    
   This regex is tricky because the class's function is no function keyword in JavaScript. 
   If you write the regex to match them, it will match many non-function declaration things!!! 
   Like examples below:
   1. the non-function matches
     1. https://github.com/ajaxorg/ace/blob/94422a4a892495564c56089af85019a8f8f24673/lib/ace/anchor.js#L58
     2. https://github.com/ajaxorg/ace/blob/d95725983b363a616c584237013dfd36eaadbba4/lib/ace/lib/dom.js#L132
   2. the function in class
     1. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L1929
     2. https://github.com/webpack/webpack/blob/ccecc17c01af96edddb931a76e7a3b21ef2969d8/lib/Chunk.js#L179
     3. https://github.com/webpack/webpack/blob/612de998f186a9bb2fe8769a91678df689a0541e/lib/Module.js#L242
     4. https://github.com/mozilla/pdf.js/blob/5cf116a958548f6596674bf8d5ca0fe64aa2df3c/web/view_history.js#L75
     5. https://github.com/mozilla/pdf.js/blob/5cf116a958548f6596674bf8d5ca0fe64aa2df3c/web/view_history.js#L89
   
    My solution is to add some negation rules, and one rule is skipping the keywords that may have characters immediately 
    following them in the parenthesis, rule is `!^[ \t]*(if|do|while|for|with|switch|catch|import|return)`.
    Another negation rule is only before this 「function in class」 regex, that skips the line's indent level more than 
    one because most of the function in class has one indent level(the class is on top-level). The negation rule is 
    `!^(\t{2,}|[ ]{5,})`.
    
    But this is not enough, because maybe has function call on one indent level. So need an negation rule for skipping 
    statement. The negation rule is `!^.*;[ \t]*`. But the bad news is JavaScript's statement can end without a semicolon. 
    So it still has an opportunity to recognize function call as the function declaration if the code's statement does not 
    end with semicolons.

Word's pattern for JavaScript
In this part, I reference the formal ECMA specification heavily[2].
JavaScript has some special syntax, such as numbers can be separated with an underscore for readability[3]. 
And has BigInt literal, which is number end with a 「n」 character. So the number-related regex would be some 
differences with another language.

In the last, I had a fork git project on Github. And has the CI's result, the all test cases pass[4].

[1] https://github.com/webpack/webpack/blob/2279c5a2105ea1498b83a4854919aefe1a28c553/lib/ChunkGraph.js#L279
[2] https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
[3] https://v8.dev/features/numeric-separators
[4] https://github.com/a97410985/git/actions/runs/1933091300

xing zhi jiang (1):
  Add a diff driver for JavaScript languages.

 .gitignore                                    |  1 +
 Documentation/gitattributes.txt               |  2 +
 ...avascript-assignment-of-anonymous-function |  4 ++
 .../javascript-assignment-of-arrow-function   |  4 ++
 .../javascript-assignment-of-named-function   |  4 ++
 t/t4018/javascript-async-function             |  4 ++
 t/t4018/javascript-export-async-function      |  4 ++
 t/t4018/javascript-export-function            |  4 ++
 t/t4018/javascript-exports-anomyous-function  |  4 ++
 .../javascript-exports-anomyous-function-2    |  4 ++
 t/t4018/javascript-exports-function           |  4 ++
 t/t4018/javascript-function                   |  4 ++
 t/t4018/javascript-function-2                 | 10 ++++
 t/t4018/javascript-function-belong-to-IIFE    |  6 +++
 t/t4018/javascript-function-in-class          |  6 +++
 t/t4018/javascript-function-in-class-2        | 11 ++++
 t/t4018/javascript-function-in-object-literal |  7 +++
 t/t4018/javascript-generator-function         |  4 ++
 t/t4018/javascript-generator-function-2       |  4 ++
 t/t4018/javascript-getter-function-in-class   |  6 +++
 t/t4018/javascript-setter-function-in-class   |  6 +++
 .../javascript-skip-function-call-statement   |  7 +++
 t/t4018/javascript-skip-keywords              | 34 ++++++++++++
 t/t4018/javascript-static-function-in-class   |  6 +++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/javascript/expect                     | 52 +++++++++++++++++++
 t/t4034/javascript/post                       | 32 ++++++++++++
 t/t4034/javascript/pre                        | 32 ++++++++++++
 userdiff.c                                    | 38 ++++++++++++++
 29 files changed, 305 insertions(+)
 create mode 100644 t/t4018/javascript-assignment-of-anonymous-function
 create mode 100644 t/t4018/javascript-assignment-of-arrow-function
 create mode 100644 t/t4018/javascript-assignment-of-named-function
 create mode 100644 t/t4018/javascript-async-function
 create mode 100644 t/t4018/javascript-export-async-function
 create mode 100644 t/t4018/javascript-export-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function-2
 create mode 100644 t/t4018/javascript-exports-function
 create mode 100644 t/t4018/javascript-function
 create mode 100644 t/t4018/javascript-function-2
 create mode 100644 t/t4018/javascript-function-belong-to-IIFE
 create mode 100644 t/t4018/javascript-function-in-class
 create mode 100644 t/t4018/javascript-function-in-class-2
 create mode 100644 t/t4018/javascript-function-in-object-literal
 create mode 100644 t/t4018/javascript-generator-function
 create mode 100644 t/t4018/javascript-generator-function-2
 create mode 100644 t/t4018/javascript-getter-function-in-class
 create mode 100644 t/t4018/javascript-setter-function-in-class
 create mode 100644 t/t4018/javascript-skip-function-call-statement
 create mode 100644 t/t4018/javascript-skip-keywords
 create mode 100644 t/t4018/javascript-static-function-in-class
 create mode 100644 t/t4034/javascript/expect
 create mode 100644 t/t4034/javascript/post
 create mode 100644 t/t4034/javascript/pre


base-commit: e6ebfd0e8cbbd10878070c8a356b5ad1b3ca464e
-- 
2.35.1.273.ge6ebfd0e8c.dirty


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages.
  2022-03-04 13:08 [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language xing zhi jiang
@ 2022-03-04 13:08 ` xing zhi jiang
  2022-03-05 10:16   ` Johannes Sixt
  2022-03-05 13:41 ` [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language Johannes Sixt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: xing zhi jiang @ 2022-03-04 13:08 UTC (permalink / raw)
  To: git; +Cc: a97410985new

In the xfunction part that matches normal functions,
a variable declaration with an assignment of function, the function declaration
in the class, and also the function is object literal's property.

And in the word regex part, that matches numbers, punctuations, and also the
JavaScript identifier.
This part heavily references the formal ECMA sepcification[1].

[1]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar

Signed-off-by: xing zhi jiang <a97410985new@gmail.com>
---
 .gitignore                                    |  1 +
 Documentation/gitattributes.txt               |  2 +
 ...avascript-assignment-of-anonymous-function |  4 ++
 .../javascript-assignment-of-arrow-function   |  4 ++
 .../javascript-assignment-of-named-function   |  4 ++
 t/t4018/javascript-async-function             |  4 ++
 t/t4018/javascript-export-async-function      |  4 ++
 t/t4018/javascript-export-function            |  4 ++
 t/t4018/javascript-exports-anomyous-function  |  4 ++
 .../javascript-exports-anomyous-function-2    |  4 ++
 t/t4018/javascript-exports-function           |  4 ++
 t/t4018/javascript-function                   |  4 ++
 t/t4018/javascript-function-2                 | 10 ++++
 t/t4018/javascript-function-belong-to-IIFE    |  6 +++
 t/t4018/javascript-function-in-class          |  6 +++
 t/t4018/javascript-function-in-class-2        | 11 ++++
 t/t4018/javascript-function-in-object-literal |  7 +++
 t/t4018/javascript-generator-function         |  4 ++
 t/t4018/javascript-generator-function-2       |  4 ++
 t/t4018/javascript-getter-function-in-class   |  6 +++
 t/t4018/javascript-setter-function-in-class   |  6 +++
 .../javascript-skip-function-call-statement   |  7 +++
 t/t4018/javascript-skip-keywords              | 34 ++++++++++++
 t/t4018/javascript-static-function-in-class   |  6 +++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/javascript/expect                     | 52 +++++++++++++++++++
 t/t4034/javascript/post                       | 32 ++++++++++++
 t/t4034/javascript/pre                        | 32 ++++++++++++
 userdiff.c                                    | 38 ++++++++++++++
 29 files changed, 305 insertions(+)
 create mode 100644 t/t4018/javascript-assignment-of-anonymous-function
 create mode 100644 t/t4018/javascript-assignment-of-arrow-function
 create mode 100644 t/t4018/javascript-assignment-of-named-function
 create mode 100644 t/t4018/javascript-async-function
 create mode 100644 t/t4018/javascript-export-async-function
 create mode 100644 t/t4018/javascript-export-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function-2
 create mode 100644 t/t4018/javascript-exports-function
 create mode 100644 t/t4018/javascript-function
 create mode 100644 t/t4018/javascript-function-2
 create mode 100644 t/t4018/javascript-function-belong-to-IIFE
 create mode 100644 t/t4018/javascript-function-in-class
 create mode 100644 t/t4018/javascript-function-in-class-2
 create mode 100644 t/t4018/javascript-function-in-object-literal
 create mode 100644 t/t4018/javascript-generator-function
 create mode 100644 t/t4018/javascript-generator-function-2
 create mode 100644 t/t4018/javascript-getter-function-in-class
 create mode 100644 t/t4018/javascript-setter-function-in-class
 create mode 100644 t/t4018/javascript-skip-function-call-statement
 create mode 100644 t/t4018/javascript-skip-keywords
 create mode 100644 t/t4018/javascript-static-function-in-class
 create mode 100644 t/t4034/javascript/expect
 create mode 100644 t/t4034/javascript/post
 create mode 100644 t/t4034/javascript/pre

diff --git a/.gitignore b/.gitignore
index f817c509ec..de628c53f8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -244,3 +244,4 @@ Release/
 /git.VC.db
 *.dSYM
 /contrib/buildsystems/out
+/.cache
\ No newline at end of file
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 60984a4682..f6554a4651 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -828,6 +828,8 @@ patterns are available:
 
 - `java` suitable for source code in the Java language.
 
+- `javascript suitable for source code in the JavaScript language.
+
 - `markdown` suitable for Markdown documents.
 
 - `matlab` suitable for source code in the MATLAB and Octave languages.
diff --git a/t/t4018/javascript-assignment-of-anonymous-function b/t/t4018/javascript-assignment-of-anonymous-function
new file mode 100644
index 0000000000..5d4701e84c
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-anonymous-function
@@ -0,0 +1,4 @@
+const RIGHT = function (a, b) {
+	
+    return a + b; // ChangeMe
+};
\ No newline at end of file
diff --git a/t/t4018/javascript-assignment-of-arrow-function b/t/t4018/javascript-assignment-of-arrow-function
new file mode 100644
index 0000000000..6ab73ccb7a
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-arrow-function
@@ -0,0 +1,4 @@
+const RIGHT = (a, b) => {
+	
+    return a + b; // ChangeMe
+};
\ No newline at end of file
diff --git a/t/t4018/javascript-assignment-of-named-function b/t/t4018/javascript-assignment-of-named-function
new file mode 100644
index 0000000000..85d43123a6
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-named-function
@@ -0,0 +1,4 @@
+const RIGHT = function test (a, b) {
+	
+    return a + b; // ChangeMe
+};
\ No newline at end of file
diff --git a/t/t4018/javascript-async-function b/t/t4018/javascript-async-function
new file mode 100644
index 0000000000..ebc7c8c05b
--- /dev/null
+++ b/t/t4018/javascript-async-function
@@ -0,0 +1,4 @@
+async function RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-export-async-function b/t/t4018/javascript-export-async-function
new file mode 100644
index 0000000000..3cd60b7980
--- /dev/null
+++ b/t/t4018/javascript-export-async-function
@@ -0,0 +1,4 @@
+export async function RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-export-function b/t/t4018/javascript-export-function
new file mode 100644
index 0000000000..177ddec779
--- /dev/null
+++ b/t/t4018/javascript-export-function
@@ -0,0 +1,4 @@
+export function RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-exports-anomyous-function b/t/t4018/javascript-exports-anomyous-function
new file mode 100644
index 0000000000..45b0ecd659
--- /dev/null
+++ b/t/t4018/javascript-exports-anomyous-function
@@ -0,0 +1,4 @@
+exports.setFlagged = RIGHT => {
+	
+    return ChangeMe;
+};
\ No newline at end of file
diff --git a/t/t4018/javascript-exports-anomyous-function-2 b/t/t4018/javascript-exports-anomyous-function-2
new file mode 100644
index 0000000000..0c572bfde3
--- /dev/null
+++ b/t/t4018/javascript-exports-anomyous-function-2
@@ -0,0 +1,4 @@
+exports.RIGHT = (a, b, runtime) => {
+	
+    return ChangeMe;
+};
\ No newline at end of file
diff --git a/t/t4018/javascript-exports-function b/t/t4018/javascript-exports-function
new file mode 100644
index 0000000000..f1587fddac
--- /dev/null
+++ b/t/t4018/javascript-exports-function
@@ -0,0 +1,4 @@
+exports.RIGHT = function(document) {
+    
+    return ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-function b/t/t4018/javascript-function
new file mode 100644
index 0000000000..dd8ab54c9b
--- /dev/null
+++ b/t/t4018/javascript-function
@@ -0,0 +1,4 @@
+function RIGHT(a, b) {
+
+  return a + b; // ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-function-2 b/t/t4018/javascript-function-2
new file mode 100644
index 0000000000..d72063cdf0
--- /dev/null
+++ b/t/t4018/javascript-function-2
@@ -0,0 +1,10 @@
+function test(a, b) {
+  return {
+			RIGHT: function () {
+				currentUpdateRemovedChunks.forEach(function (chunkId) {
+					delete $installedChunks$[chunkId];
+				});
+				currentUpdateRemovedChunks = ChangeMe;
+   }
+  }
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-function-belong-to-IIFE b/t/t4018/javascript-function-belong-to-IIFE
new file mode 100644
index 0000000000..6a80118e8a
--- /dev/null
+++ b/t/t4018/javascript-function-belong-to-IIFE
@@ -0,0 +1,6 @@
+(function () {
+  this.$RIGHT = function (needle, modifier) {
+      let a = 5;
+      return ChangeMe;
+  };
+}).call(aaaa.prototype);
\ No newline at end of file
diff --git a/t/t4018/javascript-function-in-class b/t/t4018/javascript-function-in-class
new file mode 100644
index 0000000000..4b2f9c37e0
--- /dev/null
+++ b/t/t4018/javascript-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-function-in-class-2 b/t/t4018/javascript-function-in-class-2
new file mode 100644
index 0000000000..402c4aecc3
--- /dev/null
+++ b/t/t4018/javascript-function-in-class-2
@@ -0,0 +1,11 @@
+class Test {
+  RIGHT(
+      aaaaaaaaaa,
+      bbbbbbbbbb,
+      cccccccccc,
+      dddddddddd
+  ) {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-function-in-object-literal b/t/t4018/javascript-function-in-object-literal
new file mode 100644
index 0000000000..37f7ef72ee
--- /dev/null
+++ b/t/t4018/javascript-function-in-object-literal
@@ -0,0 +1,7 @@
+const obj = {
+    RIGHT: function (elems, callback, arg) {
+        var length, value;
+        // ...
+        return ChangeMe
+    }
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-generator-function b/t/t4018/javascript-generator-function
new file mode 100644
index 0000000000..caf0b9f04f
--- /dev/null
+++ b/t/t4018/javascript-generator-function
@@ -0,0 +1,4 @@
+function* RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-generator-function-2 b/t/t4018/javascript-generator-function-2
new file mode 100644
index 0000000000..6fc3b74a0d
--- /dev/null
+++ b/t/t4018/javascript-generator-function-2
@@ -0,0 +1,4 @@
+function *RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-getter-function-in-class b/t/t4018/javascript-getter-function-in-class
new file mode 100644
index 0000000000..0159541be7
--- /dev/null
+++ b/t/t4018/javascript-getter-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  get RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-setter-function-in-class b/t/t4018/javascript-setter-function-in-class
new file mode 100644
index 0000000000..3e17f47aa2
--- /dev/null
+++ b/t/t4018/javascript-setter-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  set RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-skip-function-call-statement b/t/t4018/javascript-skip-function-call-statement
new file mode 100644
index 0000000000..84b51514d2
--- /dev/null
+++ b/t/t4018/javascript-skip-function-call-statement
@@ -0,0 +1,7 @@
+class Test {
+  static RIGHT() {
+    haha();
+    haha2()
+    let b = ChangeMe;
+  }
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-skip-keywords b/t/t4018/javascript-skip-keywords
new file mode 100644
index 0000000000..1ed56c08de
--- /dev/null
+++ b/t/t4018/javascript-skip-keywords
@@ -0,0 +1,34 @@
+function RIGHT(a, b) {
+  import("./async1")
+  if (a > 1) {
+    // ...
+  }
+  do {
+    // ...
+  } while (i < 5);
+  for (const element of array1) {
+    console.log(element)
+  }
+  with(o) {
+    console.log(x)
+  }
+  switch (expr) {
+    case 'a':
+      // ...
+      break;
+    case 'b':
+      // ...
+      break;
+    default:
+      // ...
+  }
+  try {
+    // ...
+    return (a + c)
+  } 
+  catch (error) {
+    // ...
+  }
+
+  return a + b; // ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/javascript-static-function-in-class b/t/t4018/javascript-static-function-in-class
new file mode 100644
index 0000000000..efbccaf113
--- /dev/null
+++ b/t/t4018/javascript-static-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  static RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
\ No newline at end of file
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index d5abcf4b4c..33073edeca 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -324,6 +324,7 @@ test_language_driver dts
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
+test_language_driver javascript
 test_language_driver matlab
 test_language_driver objc
 test_language_driver pascal
diff --git a/t/t4034/javascript/expect b/t/t4034/javascript/expect
new file mode 100644
index 0000000000..602513c651
--- /dev/null
+++ b/t/t4034/javascript/expect
@@ -0,0 +1,52 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index b72238f..8bc3e3a 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,32 +1,32 @@<RESET>
+// DecimalLiteral<RESET>
+<RED>123<RESET>
+<RED>0.123<RESET>
+<RED>0.123e+5<RESET>
+<RED>0.123E+5<RESET>
+<RED>0.123e5<RESET>
+<RED>1222222222222222223334444n<RESET><GREEN>124<RESET>
+<GREEN>0.124<RESET>
+<GREEN>0.123e-5<RESET>
+<GREEN>0.123E-5<RESET>
+<GREEN>0.123E5<RESET>
+<GREEN>12222222222222222233344445n<RESET>
+// HexIntegerLiteral<RESET>
+<RED>0x10<RESET>
+<RED>0X6Fa1<RESET>
+<RED>0x123_456<RESET>
+<RED>0x1234182989812f1289an<RESET><GREEN>0x11<RESET>
+<GREEN>0X5Fa1<RESET>
+<GREEN>0x123_756<RESET>
+<GREEN>0x1234182989812f1289bn<RESET>
+// OctalIntegerLiteral<RESET>
+<RED>05<RESET>
+<RED>0o6<RESET>
+<RED>0O7<RESET>
+<RED>0512_567<RESET>
+<RED>0o424242424242424242424242424242666666n<RESET><GREEN>06<RESET>
+<GREEN>0o5<RESET>
+<GREEN>0O4<RESET>
+<GREEN>0511_567<RESET>
+<GREEN>0o424242424242424242424242424242666667n<RESET>
+// BinaryIntegerLiteral<RESET>
+<RED>0b1001<RESET>
+<RED>0B0110<RESET>
+<RED>0b0001_1001_0011<RESET>
+<RED>0b1111111111111111111111111111111111111n<RESET><GREEN>0b1101<RESET>
+<GREEN>0B0010<RESET>
+<GREEN>0b0001_1101_0011<RESET>
+<GREEN>0b11111111111111000011111111111111111n<RESET>
+// punctuations<RESET>
+{<RED>a<RESET><GREEN>b<RESET>} (<RED>a<RESET><GREEN>b<RESET>)
+<RED>a<RESET><GREEN>b<RESET>;
+[<RED>1,<RESET>2<GREEN>,3<RESET>]
+[<RED>1, 2,<RESET> ...<RED>params<RESET><GREEN>params_v2<RESET> ]
+a<RED><=<RESET><GREEN>=<RESET>2 a<RED>>=<RESET><GREEN>=<RESET>2 a<RED>==<RESET><GREEN>=<RESET>2 a<RED>!=<RESET><GREEN>=<RESET>2 a<RED>===<RESET><GREEN>=<RESET>2 a<RED>!==<RESET><GREEN>=<RESET>2 a<RED>^=<RESET><GREEN>=<RESET>2 a<RED>=><RESET><GREEN>=<RESET>2
+a<RED>+=<RESET><GREEN>-=<RESET>b a<RED>*=<RESET><GREEN>%=<RESET>b a<RED>**=<RESET><GREEN>&&=<RESET>b a<RED>||=<RESET><GREEN>|=<RESET>b
+b<RED>+<RESET><GREEN>-<RESET>c a<RED>--<RESET><GREEN>++<RESET> a<RED>>><RESET><GREEN><<<RESET>b a<RED>>>><RESET><GREEN>>>>=<RESET>b a<RED>>>=<RESET><GREEN><<=<RESET>b
+a<RED>&&<RESET><GREEN>&<RESET>b a<RED>||<RESET><GREEN>|<RESET>b a<RED>&&=<RESET><GREEN>??=<RESET>b
diff --git a/t/t4034/javascript/post b/t/t4034/javascript/post
new file mode 100644
index 0000000000..8bc3e3af12
--- /dev/null
+++ b/t/t4034/javascript/post
@@ -0,0 +1,32 @@
+// DecimalLiteral
+124
+0.124
+0.123e-5
+0.123E-5
+0.123E5
+12222222222222222233344445n
+// HexIntegerLiteral
+0x11
+0X5Fa1
+0x123_756
+0x1234182989812f1289bn
+// OctalIntegerLiteral
+06
+0o5
+0O4
+0511_567
+0o424242424242424242424242424242666667n
+// BinaryIntegerLiteral
+0b1101
+0B0010
+0b0001_1101_0011
+0b11111111111111000011111111111111111n
+// punctuations
+{b} (b)
+b;
+[2,3]
+[ ...params_v2 ]
+a=2 a=2 a=2 a=2 a=2 a=2 a=2 a=2
+a-=b a%=b a&&=b a|=b
+b-c a++ a<<b a>>>=b a<<=b
+a&b a|b a??=b
\ No newline at end of file
diff --git a/t/t4034/javascript/pre b/t/t4034/javascript/pre
new file mode 100644
index 0000000000..b72238f779
--- /dev/null
+++ b/t/t4034/javascript/pre
@@ -0,0 +1,32 @@
+// DecimalLiteral
+123
+0.123
+0.123e+5
+0.123E+5
+0.123e5
+1222222222222222223334444n
+// HexIntegerLiteral
+0x10
+0X6Fa1
+0x123_456
+0x1234182989812f1289an
+// OctalIntegerLiteral
+05
+0o6
+0O7
+0512_567
+0o424242424242424242424242424242666666n
+// BinaryIntegerLiteral
+0b1001
+0B0110
+0b0001_1001_0011
+0b1111111111111111111111111111111111111n
+// punctuations
+{a} (a)
+a;
+[1,2]
+[ 1, 2, ...params ]
+a<=2 a>=2 a==2 a!=2 a===2 a!==2 a^=2 a=>2
+a+=b a*=b a**=b a||=b
+b+c a-- a>>b a>>>b a>>=b
+a&&b a||b a&&=b
\ No newline at end of file
diff --git a/userdiff.c b/userdiff.c
index 8578cb0d12..a6a341e3c1 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -168,6 +168,44 @@ PATTERNS("java",
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
+
+PATTERNS("javascript",
+	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
+	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
+	 /* don't match statement */
+	 "!^.*;[ \t]*\n"
+	 /* match normal function */
+	 "^[\t ]*((export[\t ]+)?(async[\t ]+)?function[\t ]*([\t ]*\\*[\t ]*|[\t ]*)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
+	 /* match JavaScript variable declaration with a lambda expression */
+	 "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
+	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*[\t ])[\t ]*=>[\t ]*\\{?)\n"
+	 /* match exports for anonymous fucntion */
+	 "^[\t ]*(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
+	 /* match assign function to LHS */
+	 "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"
+	 /* match normal function in object literal */
+	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n"
+	 /* don't match the function in class, which has more than one ident level */
+	 "!^(\t{2,}|[ ]{5,})\n"
+	 /* match function in class */
+	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",
+	 /* word regex */
+	 /* hexIntegerLiteral and bigHexIntegerLiteral*/
+	 "0[xX][0-9a-fA-F][_0-9a-fA-F]*n?|"
+	 /* octalIntegerLiteral and bigOctalIntegerLiteral */
+	 "0[oO]?[0-7][_0-7]*n?|"
+	 /* binaryIntegerLiteral and bigBinaryIntegerLiteral */
+	 "(0[bB][01][_01]*n?)|"
+	 /* decimalLiteral */
+	 "(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?|"
+	 /* bigDecimalLiteral */
+	 "(0|[1-9][_0-9]*)n|"
+	 /* punctuations */
+	 "\\{|\\}|\\(|\\)|\\.|\\.{3}|;|,|<|>|<=|>=|==|!=|={3}|!==|\\+|-|\\*|/|%|\\*{2}|"
+	 "\\+{2}|--|<<|>>|>>>|&|\\||\\^|!|~|&&|\\|{2}|\\?{1,2}|:|=|\\+=|-=|\\*=|%=|\\*{2}=|"
+	 "<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>|"
+	 /* identifiers */
+	 "[$_[:alpha:]][$_[:alnum:]]*"),
 PATTERNS("markdown",
 	 "^ {0,3}#{1,6}[ \t].*",
 	 /* -- */
-- 
2.35.1.273.ge6ebfd0e8c.dirty


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages.
  2022-03-04 13:08 ` [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages xing zhi jiang
@ 2022-03-05 10:16   ` Johannes Sixt
  2022-03-07 15:10     ` xing-zhi jiang
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Sixt @ 2022-03-05 10:16 UTC (permalink / raw)
  To: xing zhi jiang; +Cc: git

Thank you for your contribution!

Am 04.03.22 um 14:08 schrieb xing zhi jiang:
> In the xfunction part that matches normal functions,
> a variable declaration with an assignment of function, the function declaration
> in the class, and also the function is object literal's property.

On the first read, I stumbled over the last half-sentence. Is this
JavaScript-lingo for the construct

   x = { foo: function() { ... } }

A parenthetical note in this regard would be helpful.

> And in the word regex part, that matches numbers, punctuations, and also the
> JavaScript identifier.
> This part heavily references the formal ECMA sepcification[1].
> 
> [1]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar

After having seen the regex, to follow the syntax specification is
unnecessarily tight. If you follow my suggestions, the note should
probably be rewritten, but keeping the link to the language reference is
certainly helpful.

> 
> Signed-off-by: xing zhi jiang <a97410985new@gmail.com>
> ---
>  .gitignore                                    |  1 +
>  Documentation/gitattributes.txt               |  2 +
>  ...avascript-assignment-of-anonymous-function |  4 ++
>  .../javascript-assignment-of-arrow-function   |  4 ++
>  .../javascript-assignment-of-named-function   |  4 ++
>  t/t4018/javascript-async-function             |  4 ++
>  t/t4018/javascript-export-async-function      |  4 ++
>  t/t4018/javascript-export-function            |  4 ++
>  t/t4018/javascript-exports-anomyous-function  |  4 ++
>  .../javascript-exports-anomyous-function-2    |  4 ++
>  t/t4018/javascript-exports-function           |  4 ++
>  t/t4018/javascript-function                   |  4 ++
>  t/t4018/javascript-function-2                 | 10 ++++
>  t/t4018/javascript-function-belong-to-IIFE    |  6 +++
>  t/t4018/javascript-function-in-class          |  6 +++
>  t/t4018/javascript-function-in-class-2        | 11 ++++
>  t/t4018/javascript-function-in-object-literal |  7 +++
>  t/t4018/javascript-generator-function         |  4 ++
>  t/t4018/javascript-generator-function-2       |  4 ++
>  t/t4018/javascript-getter-function-in-class   |  6 +++
>  t/t4018/javascript-setter-function-in-class   |  6 +++
>  .../javascript-skip-function-call-statement   |  7 +++
>  t/t4018/javascript-skip-keywords              | 34 ++++++++++++
>  t/t4018/javascript-static-function-in-class   |  6 +++
>  t/t4034-diff-words.sh                         |  1 +
>  t/t4034/javascript/expect                     | 52 +++++++++++++++++++
>  t/t4034/javascript/post                       | 32 ++++++++++++
>  t/t4034/javascript/pre                        | 32 ++++++++++++
>  userdiff.c                                    | 38 ++++++++++++++
>  29 files changed, 305 insertions(+)
>  create mode 100644 t/t4018/javascript-assignment-of-anonymous-function
>  create mode 100644 t/t4018/javascript-assignment-of-arrow-function
>  create mode 100644 t/t4018/javascript-assignment-of-named-function
>  create mode 100644 t/t4018/javascript-async-function
>  create mode 100644 t/t4018/javascript-export-async-function
>  create mode 100644 t/t4018/javascript-export-function
>  create mode 100644 t/t4018/javascript-exports-anomyous-function
>  create mode 100644 t/t4018/javascript-exports-anomyous-function-2
>  create mode 100644 t/t4018/javascript-exports-function
>  create mode 100644 t/t4018/javascript-function
>  create mode 100644 t/t4018/javascript-function-2
>  create mode 100644 t/t4018/javascript-function-belong-to-IIFE
>  create mode 100644 t/t4018/javascript-function-in-class
>  create mode 100644 t/t4018/javascript-function-in-class-2
>  create mode 100644 t/t4018/javascript-function-in-object-literal
>  create mode 100644 t/t4018/javascript-generator-function
>  create mode 100644 t/t4018/javascript-generator-function-2
>  create mode 100644 t/t4018/javascript-getter-function-in-class
>  create mode 100644 t/t4018/javascript-setter-function-in-class
>  create mode 100644 t/t4018/javascript-skip-function-call-statement
>  create mode 100644 t/t4018/javascript-skip-keywords
>  create mode 100644 t/t4018/javascript-static-function-in-class
>  create mode 100644 t/t4034/javascript/expect
>  create mode 100644 t/t4034/javascript/post
>  create mode 100644 t/t4034/javascript/pre
> 
> diff --git a/.gitignore b/.gitignore
> index f817c509ec..de628c53f8 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -244,3 +244,4 @@ Release/
>  /git.VC.db
>  *.dSYM
>  /contrib/buildsystems/out
> +/.cache
> \ No newline at end of file

Do not include this change. It does not belong to this commit.

> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 60984a4682..f6554a4651 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -828,6 +828,8 @@ patterns are available:
>  
>  - `java` suitable for source code in the Java language.
>  
> +- `javascript suitable for source code in the JavaScript language.

Please do not forget the closing quote.

> +
>  - `markdown` suitable for Markdown documents.
>  
>  - `matlab` suitable for source code in the MATLAB and Octave languages.
> diff --git a/t/t4018/javascript-assignment-of-anonymous-function b/t/t4018/javascript-assignment-of-anonymous-function
> new file mode 100644
> index 0000000000..5d4701e84c
> --- /dev/null
> +++ b/t/t4018/javascript-assignment-of-anonymous-function
> @@ -0,0 +1,4 @@
> +const RIGHT = function (a, b) {
> +	
> +    return a + b; // ChangeMe
> +};
> \ No newline at end of file

Notice this "No newline at end of file". Please complete the last line
of the file, i.e. do add the last line break. Same for all other new
files introduced in this patch.

> diff --git a/t/t4018/javascript-assignment-of-arrow-function b/t/t4018/javascript-assignment-of-arrow-function
> new file mode 100644
> index 0000000000..6ab73ccb7a
> --- /dev/null
> +++ b/t/t4018/javascript-assignment-of-arrow-function
> @@ -0,0 +1,4 @@
> +const RIGHT = (a, b) => {
> +	
> +    return a + b; // ChangeMe
> +};
> \ No newline at end of file
> diff --git a/t/t4018/javascript-assignment-of-named-function b/t/t4018/javascript-assignment-of-named-function
> new file mode 100644
> index 0000000000..85d43123a6
> --- /dev/null
> +++ b/t/t4018/javascript-assignment-of-named-function
> @@ -0,0 +1,4 @@
> +const RIGHT = function test (a, b) {
> +	
> +    return a + b; // ChangeMe
> +};
> \ No newline at end of file
> diff --git a/t/t4018/javascript-async-function b/t/t4018/javascript-async-function
> new file mode 100644
> index 0000000000..ebc7c8c05b
> --- /dev/null
> +++ b/t/t4018/javascript-async-function
> @@ -0,0 +1,4 @@
> +async function RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-export-async-function b/t/t4018/javascript-export-async-function
> new file mode 100644
> index 0000000000..3cd60b7980
> --- /dev/null
> +++ b/t/t4018/javascript-export-async-function
> @@ -0,0 +1,4 @@
> +export async function RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-export-function b/t/t4018/javascript-export-function
> new file mode 100644
> index 0000000000..177ddec779
> --- /dev/null
> +++ b/t/t4018/javascript-export-function
> @@ -0,0 +1,4 @@
> +export function RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-exports-anomyous-function b/t/t4018/javascript-exports-anomyous-function
> new file mode 100644
> index 0000000000..45b0ecd659
> --- /dev/null
> +++ b/t/t4018/javascript-exports-anomyous-function
> @@ -0,0 +1,4 @@
> +exports.setFlagged = RIGHT => {
> +	
> +    return ChangeMe;
> +};
> \ No newline at end of file
> diff --git a/t/t4018/javascript-exports-anomyous-function-2 b/t/t4018/javascript-exports-anomyous-function-2
> new file mode 100644
> index 0000000000..0c572bfde3
> --- /dev/null
> +++ b/t/t4018/javascript-exports-anomyous-function-2
> @@ -0,0 +1,4 @@
> +exports.RIGHT = (a, b, runtime) => {
> +	
> +    return ChangeMe;
> +};
> \ No newline at end of file
> diff --git a/t/t4018/javascript-exports-function b/t/t4018/javascript-exports-function
> new file mode 100644
> index 0000000000..f1587fddac
> --- /dev/null
> +++ b/t/t4018/javascript-exports-function
> @@ -0,0 +1,4 @@
> +exports.RIGHT = function(document) {
> +    
> +    return ChangeMe
> +}

Not a particularly important point, but the correct syntax requires a
semicolon here, I guess.

> \ No newline at end of file
> diff --git a/t/t4018/javascript-function b/t/t4018/javascript-function
> new file mode 100644
> index 0000000000..dd8ab54c9b
> --- /dev/null
> +++ b/t/t4018/javascript-function
> @@ -0,0 +1,4 @@
> +function RIGHT(a, b) {
> +
> +  return a + b; // ChangeMe
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-function-2 b/t/t4018/javascript-function-2
> new file mode 100644
> index 0000000000..d72063cdf0
> --- /dev/null
> +++ b/t/t4018/javascript-function-2
> @@ -0,0 +1,10 @@
> +function test(a, b) {
> +  return {
> +			RIGHT: function () {
> +				currentUpdateRemovedChunks.forEach(function (chunkId) {

This line is a decoy that is not picked up. Nice.

> +					delete $installedChunks$[chunkId];
> +				});
> +				currentUpdateRemovedChunks = ChangeMe;
> +   }
> +  }
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-function-belong-to-IIFE b/t/t4018/javascript-function-belong-to-IIFE
> new file mode 100644
> index 0000000000..6a80118e8a
> --- /dev/null
> +++ b/t/t4018/javascript-function-belong-to-IIFE
> @@ -0,0 +1,6 @@
> +(function () {
> +  this.$RIGHT = function (needle, modifier) {
> +      let a = 5;
> +      return ChangeMe;
> +  };
> +}).call(aaaa.prototype);
> \ No newline at end of file
> diff --git a/t/t4018/javascript-function-in-class b/t/t4018/javascript-function-in-class
> new file mode 100644
> index 0000000000..4b2f9c37e0
> --- /dev/null
> +++ b/t/t4018/javascript-function-in-class
> @@ -0,0 +1,6 @@
> +class Test {
> +  RIGHT() {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-function-in-class-2 b/t/t4018/javascript-function-in-class-2
> new file mode 100644
> index 0000000000..402c4aecc3
> --- /dev/null
> +++ b/t/t4018/javascript-function-in-class-2
> @@ -0,0 +1,11 @@
> +class Test {
> +  RIGHT(
> +      aaaaaaaaaa,
> +      bbbbbbbbbb,
> +      cccccccccc,
> +      dddddddddd
> +  ) {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> \ No newline at end of file

In the above two, we see class member functions. Is there a test case
where the function parameter is on the same line as the function name or
is that one of the difficult cases?

> diff --git a/t/t4018/javascript-function-in-object-literal b/t/t4018/javascript-function-in-object-literal
> new file mode 100644
> index 0000000000..37f7ef72ee
> --- /dev/null
> +++ b/t/t4018/javascript-function-in-object-literal
> @@ -0,0 +1,7 @@
> +const obj = {
> +    RIGHT: function (elems, callback, arg) {
> +        var length, value;
> +        // ...
> +        return ChangeMe
> +    }
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-generator-function b/t/t4018/javascript-generator-function
> new file mode 100644
> index 0000000000..caf0b9f04f
> --- /dev/null
> +++ b/t/t4018/javascript-generator-function
> @@ -0,0 +1,4 @@
> +function* RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-generator-function-2 b/t/t4018/javascript-generator-function-2
> new file mode 100644
> index 0000000000..6fc3b74a0d
> --- /dev/null
> +++ b/t/t4018/javascript-generator-function-2
> @@ -0,0 +1,4 @@
> +function *RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-getter-function-in-class b/t/t4018/javascript-getter-function-in-class
> new file mode 100644
> index 0000000000..0159541be7
> --- /dev/null
> +++ b/t/t4018/javascript-getter-function-in-class
> @@ -0,0 +1,6 @@
> +class Test {
> +  get RIGHT() {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-setter-function-in-class b/t/t4018/javascript-setter-function-in-class
> new file mode 100644
> index 0000000000..3e17f47aa2
> --- /dev/null
> +++ b/t/t4018/javascript-setter-function-in-class
> @@ -0,0 +1,6 @@
> +class Test {
> +  set RIGHT() {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> \ No newline at end of file
> diff --git a/t/t4018/javascript-skip-function-call-statement b/t/t4018/javascript-skip-function-call-statement
> new file mode 100644
> index 0000000000..84b51514d2
> --- /dev/null
> +++ b/t/t4018/javascript-skip-function-call-statement
> @@ -0,0 +1,7 @@
> +class Test {
> +  static RIGHT() {
> +    haha();
> +    haha2()
> +    let b = ChangeMe;
> +  }
> +}

Good call to include this test case!

> \ No newline at end of file
> diff --git a/t/t4018/javascript-skip-keywords b/t/t4018/javascript-skip-keywords
> new file mode 100644
> index 0000000000..1ed56c08de
> --- /dev/null
> +++ b/t/t4018/javascript-skip-keywords
> @@ -0,0 +1,34 @@
> +function RIGHT(a, b) {
> +  import("./async1")
> +  if (a > 1) {
> +    // ...
> +  }
> +  do {
> +    // ...
> +  } while (i < 5);
> +  for (const element of array1) {
> +    console.log(element)
> +  }
> +  with(o) {
> +    console.log(x)
> +  }
> +  switch (expr) {
> +    case 'a':
> +      // ...
> +      break;
> +    case 'b':
> +      // ...
> +      break;
> +    default:
> +      // ...
> +  }
> +  try {
> +    // ...
> +    return (a + c)
> +  } 
> +  catch (error) {
> +    // ...
> +  }
> +
> +  return a + b; // ChangeMe
> +}

Very well!

> \ No newline at end of file
> diff --git a/t/t4018/javascript-static-function-in-class b/t/t4018/javascript-static-function-in-class
> new file mode 100644
> index 0000000000..efbccaf113
> --- /dev/null
> +++ b/t/t4018/javascript-static-function-in-class
> @@ -0,0 +1,6 @@
> +class Test {
> +  static RIGHT() {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> \ No newline at end of file
> diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
> index d5abcf4b4c..33073edeca 100755
> --- a/t/t4034-diff-words.sh
> +++ b/t/t4034-diff-words.sh
> @@ -324,6 +324,7 @@ test_language_driver dts
>  test_language_driver fortran
>  test_language_driver html
>  test_language_driver java
> +test_language_driver javascript
>  test_language_driver matlab
>  test_language_driver objc
>  test_language_driver pascal
> diff --git a/t/t4034/javascript/expect b/t/t4034/javascript/expect
> new file mode 100644
> index 0000000000..602513c651
> --- /dev/null
> +++ b/t/t4034/javascript/expect
> @@ -0,0 +1,52 @@
> +<BOLD>diff --git a/pre b/post<RESET>
> +<BOLD>index b72238f..8bc3e3a 100644<RESET>
> +<BOLD>--- a/pre<RESET>
> +<BOLD>+++ b/post<RESET>
> +<CYAN>@@ -1,32 +1,32 @@<RESET>
> +// DecimalLiteral<RESET>
> +<RED>123<RESET>
> +<RED>0.123<RESET>
> +<RED>0.123e+5<RESET>
> +<RED>0.123E+5<RESET>
> +<RED>0.123e5<RESET>
> +<RED>1222222222222222223334444n<RESET><GREEN>124<RESET>
> +<GREEN>0.124<RESET>
> +<GREEN>0.123e-5<RESET>
> +<GREEN>0.123E-5<RESET>
> +<GREEN>0.123E5<RESET>
> +<GREEN>12222222222222222233344445n<RESET>
> +// HexIntegerLiteral<RESET>
> +<RED>0x10<RESET>
> +<RED>0X6Fa1<RESET>
> +<RED>0x123_456<RESET>
> +<RED>0x1234182989812f1289an<RESET><GREEN>0x11<RESET>
> +<GREEN>0X5Fa1<RESET>
> +<GREEN>0x123_756<RESET>
> +<GREEN>0x1234182989812f1289bn<RESET>
> +// OctalIntegerLiteral<RESET>
> +<RED>05<RESET>
> +<RED>0o6<RESET>
> +<RED>0O7<RESET>
> +<RED>0512_567<RESET>
> +<RED>0o424242424242424242424242424242666666n<RESET><GREEN>06<RESET>
> +<GREEN>0o5<RESET>
> +<GREEN>0O4<RESET>
> +<GREEN>0511_567<RESET>
> +<GREEN>0o424242424242424242424242424242666667n<RESET>
> +// BinaryIntegerLiteral<RESET>
> +<RED>0b1001<RESET>
> +<RED>0B0110<RESET>
> +<RED>0b0001_1001_0011<RESET>
> +<RED>0b1111111111111111111111111111111111111n<RESET><GREEN>0b1101<RESET>
> +<GREEN>0B0010<RESET>
> +<GREEN>0b0001_1101_0011<RESET>
> +<GREEN>0b11111111111111000011111111111111111n<RESET>
> +// punctuations<RESET>
> +{<RED>a<RESET><GREEN>b<RESET>} (<RED>a<RESET><GREEN>b<RESET>)
> +<RED>a<RESET><GREEN>b<RESET>;
> +[<RED>1,<RESET>2<GREEN>,3<RESET>]
> +[<RED>1, 2,<RESET> ...<RED>params<RESET><GREEN>params_v2<RESET> ]
> +a<RED><=<RESET><GREEN>=<RESET>2 a<RED>>=<RESET><GREEN>=<RESET>2 a<RED>==<RESET><GREEN>=<RESET>2 a<RED>!=<RESET><GREEN>=<RESET>2 a<RED>===<RESET><GREEN>=<RESET>2 a<RED>!==<RESET><GREEN>=<RESET>2 a<RED>^=<RESET><GREEN>=<RESET>2 a<RED>=><RESET><GREEN>=<RESET>2
> +a<RED>+=<RESET><GREEN>-=<RESET>b a<RED>*=<RESET><GREEN>%=<RESET>b a<RED>**=<RESET><GREEN>&&=<RESET>b a<RED>||=<RESET><GREEN>|=<RESET>b
> +b<RED>+<RESET><GREEN>-<RESET>c a<RED>--<RESET><GREEN>++<RESET> a<RED>>><RESET><GREEN><<<RESET>b a<RED>>>><RESET><GREEN>>>>=<RESET>b a<RED>>>=<RESET><GREEN><<=<RESET>b
> +a<RED>&&<RESET><GREEN>&<RESET>b a<RED>||<RESET><GREEN>|<RESET>b a<RED>&&=<RESET><GREEN>??=<RESET>b

This looks good! I see many changes in operators being tested.

> diff --git a/t/t4034/javascript/post b/t/t4034/javascript/post
> new file mode 100644
> index 0000000000..8bc3e3af12
> --- /dev/null
> +++ b/t/t4034/javascript/post
> @@ -0,0 +1,32 @@
> +// DecimalLiteral
> +124
> +0.124
> +0.123e-5
> +0.123E-5
> +0.123E5
> +12222222222222222233344445n
> +// HexIntegerLiteral
> +0x11
> +0X5Fa1
> +0x123_756
> +0x1234182989812f1289bn
> +// OctalIntegerLiteral
> +06
> +0o5
> +0O4
> +0511_567
> +0o424242424242424242424242424242666667n
> +// BinaryIntegerLiteral
> +0b1101
> +0B0010
> +0b0001_1101_0011
> +0b11111111111111000011111111111111111n
> +// punctuations
> +{b} (b)
> +b;
> +[2,3]
> +[ ...params_v2 ]
> +a=2 a=2 a=2 a=2 a=2 a=2 a=2 a=2
> +a-=b a%=b a&&=b a|=b
> +b-c a++ a<<b a>>>=b a<<=b
> +a&b a|b a??=b
> \ No newline at end of file
> diff --git a/t/t4034/javascript/pre b/t/t4034/javascript/pre
> new file mode 100644
> index 0000000000..b72238f779
> --- /dev/null
> +++ b/t/t4034/javascript/pre
> @@ -0,0 +1,32 @@
> +// DecimalLiteral
> +123
> +0.123
> +0.123e+5
> +0.123E+5
> +0.123e5
> +1222222222222222223334444n
> +// HexIntegerLiteral
> +0x10
> +0X6Fa1
> +0x123_456
> +0x1234182989812f1289an
> +// OctalIntegerLiteral
> +05
> +0o6
> +0O7
> +0512_567
> +0o424242424242424242424242424242666666n
> +// BinaryIntegerLiteral
> +0b1001
> +0B0110
> +0b0001_1001_0011
> +0b1111111111111111111111111111111111111n
> +// punctuations
> +{a} (a)
> +a;
> +[1,2]
> +[ 1, 2, ...params ]
> +a<=2 a>=2 a==2 a!=2 a===2 a!==2 a^=2 a=>2
> +a+=b a*=b a**=b a||=b
> +b+c a-- a>>b a>>>b a>>=b
> +a&&b a||b a&&=b
> \ No newline at end of file
> diff --git a/userdiff.c b/userdiff.c
> index 8578cb0d12..a6a341e3c1 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -168,6 +168,44 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +
> +PATTERNS("javascript",
> +	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
> +	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"

These will not match

	}while (expr)

note the absent blank before the keyword, but that is an acceptable
trade-off to keep things simple. Good.

> +	 /* don't match statement */
> +	 "!^.*;[ \t]*\n"

This regexp can be reduced to

	"!;\n"

no?

> +	 /* match normal function */
> +	 "^[\t ]*((export[\t ]+)?(async[\t ]+)?function[\t ]*([\t ]*\\*[\t ]*|[\t ]*)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"

Good. One note though: keyword "function" can optionally be followed by
an asterisk '*'. You can probably simplify the middle part to

	...function[\t *]*...identifier...

> +	 /* match JavaScript variable declaration with a lambda expression */
> +	 "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> +	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*[\t ])[\t ]*=>[\t ]*\\{?)\n"

Let's break this down:

	"^[\t ]*"
	    "("
	        "(const|let|var)[\t ]*"
	        "[$_[:alpha:]][$_[:alnum:]]*[\t ]*"
	        "=[\t ]*"
	        "("
	            "\\(.*\\)"
	            "|"
	            "[$_[:alpha:]][$_[:alnum:]]*[\t ]"
	        ")[\t ]*"
	        "=>[\t ]*"
	        "\\{?"
	    ")\n"

Can you not have

	var f = foo=>{

because I see that whitespace is required between the identifier and "=>"?

> +	 /* match exports for anonymous fucntion */
> +	 "^[\t ]*(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"

Here, whitespace is not required. Is the above an oversight?

BTW, can keyword "exports" be used for something other than functions?

> +	 /* match assign function to LHS */
> +	 "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"

This makes me think that whenever keyword "function" appears, then we
see the beginning of a function. This would allow a simple pattern
upfront that picks out all functions defined with this keyword, and all
other patterns need only be concerned with the exceptional cases.

	/* "function" is first non-space token */
	"^[\t ]*function[\t ].*)\n"
	/* "function" is not first token */
	"^.*[^$_[:alnum:]]function[\t ].*\n"

> +	 /* match normal function in object literal */
> +	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n"
> +	 /* don't match the function in class, which has more than one ident level */
> +	 "!^(\t{2,}|[ ]{5,})\n"

For some, hopefully universally agreed upon in the JavaScript community,
definition of indentation level. Ok...

> +	 /* match function in class */
> +	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",

Aren't "get" and "set" as universal identifiers of functions or can they
occur on other contexts? Thinking of it, they can occur in comments
everywhere, so they would pick up too many matches if treated like my
"function" proposal above.

> +	 /* word regex */
> +	 /* hexIntegerLiteral and bigHexIntegerLiteral*/
> +	 "0[xX][0-9a-fA-F][_0-9a-fA-F]*n?|"
> +	 /* octalIntegerLiteral and bigOctalIntegerLiteral */
> +	 "0[oO]?[0-7][_0-7]*n?|"
> +	 /* binaryIntegerLiteral and bigBinaryIntegerLiteral */
> +	 "(0[bB][01][_01]*n?)|"
> +	 /* decimalLiteral */
> +	 "(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?|"
> +	 /* bigDecimalLiteral */
> +	 "(0|[1-9][_0-9]*)n|"

You do not have to make the word-regex so tight that it excludes
incorrect literals because you can assume that incorrect literals will
not occur. In particular integers beginning with a 0 need not be treated
specially. You can fold the octal and decimal integers into a single
expression:

	"[0-9][oO]?[_0-9.]*)([eE][+-]?[_0-9]+)?n?"

Are floatingpoint literals beginning with a decimal point like .5 not
permitted?

Please follow the custom to place the alternation character "|" at the
beginning of the next line, not at the end of the previous.

> +	 /* punctuations */
> +	 "\\{|\\}|\\(|\\)|\\.|\\.{3}|;|,|<|>|<=|>=|==|!=|={3}|!==|\\+|-|\\*|/|%|\\*{2}|"
> +	 "\\+{2}|--|<<|>>|>>>|&|\\||\\^|!|~|&&|\\|{2}|\\?{1,2}|:|=|\\+=|-=|\\*=|%=|\\*{2}=|"
> +	 "<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>|"

You could collaps many of the operators into single alternatives, but if
you prefer it this way, it is fine, too. One example:

	">{1,3}="

But please remove the single-character operators from the list, because
there is an implicit single-character alternative that is not visible in
the code here.

> +	 /* identifiers */
> +	 "[$_[:alpha:]][$_[:alnum:]]*"),
>  PATTERNS("markdown",
>  	 "^ {0,3}#{1,6}[ \t].*",
>  	 /* -- */

-- Hannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language
  2022-03-04 13:08 [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language xing zhi jiang
  2022-03-04 13:08 ` [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages xing zhi jiang
@ 2022-03-05 13:41 ` Johannes Sixt
  2022-03-12 16:48 ` [GSoC][PATCH v2] Add a diff driver for JavaScript languages xing zhi jiang
  2022-04-03 13:25 ` [GSoC][PATCH v3] " xing zhi jiang
  3 siblings, 0 replies; 25+ messages in thread
From: Johannes Sixt @ 2022-03-05 13:41 UTC (permalink / raw)
  To: xing zhi jiang; +Cc: git

Am 04.03.22 um 14:08 schrieb xing zhi jiang:
> I have already searched the git public-inbox don't find any active patch about
> userdiff build-in driver for JavaScript(there is an unfinished patch about 
> three years ago). So I pick this as my GSoC micro project.
> 
> Below are typical function patterns from JavaScript, and 
> also has an example that matches the corresponding pattern. 
> These examples come from many popular JavaScript projects. 
> Because I want to make sure the hunk header would work well 
> on real-world projects.

Thank you for the thorough research. I've already responded to the patch
about the technical solution. Here are some general thoughts.

> 
> Common function's pattern for JavaScript
> 1. normal function
>   `^[\t ]*((export[\t ]+)?((async|get|set)[\t ]+)?function[\t ]*([\t ]*\\*[\t ]*|[\t ]*)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)`
>   example: 
>   1. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L648
>   2. https://github.com/mozilla/pdf.js/blob/ad4b2ce021277ff7cea8ec7e32775c65d74ee673/test/unit/evaluator_spec.js#L40
> 2. JavaScript variable declaration with a lambda expression
>   `^^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> 	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*[\t ])[\t ]*=>[\t ]*\\{?)`
>    example:
>    1. https://github.com/webpack/webpack/blob/2279c5a2105ea1498b83a4854919aefe1a28c553/lib/ChunkGraph.js#L91
>    2. https://github.com/webpack/webpack/blob/2279c5a2105ea1498b83a4854919aefe1a28c553/lib/ChunkGraph.js#L122
>     
>    I found sometimes would define function on this way. But this should only match the top level? Because 
>    it may match inside the function, and the below code would match the wrong function[1].

Functions nested in other functions in general lead to surprising
results with a stateless parser. There is not much you can do if the
nesting level cannot be determined. Using the indentation as a proxy may
work, but it depends on that the indentation width is agreed on in the
community.

Note that the function header patterns are used for these tasks:

1. For hunk headers in patch text
2. To find the context in diff and grep --function-context
3. To find the lines of interest in log -L:function_name:file.name

For the first, mismatching the function is secondary because it is (IMO)
just a hint for the reader about where the hunk should apply.

For the others, finding the actual beginning and end of a function would
be desirable. This does not work as intended when patterns match nested
function headers. But a general solution to this problem basically means
to implement a parser for the language.

> 
> 3. exports methods by assigning an anonymous function
>   `^[\t ]*(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)`
>    example:
>    1. https://github.com/webpack/webpack/blob/c181294865dca01b28e6e316636fef5f2aad4eb6/lib/dependencies/DynamicExports.js#L17
>    2. https://github.com/ajaxorg/ace/blob/d95725983b363a616c584237013dfd36eaadbba4/lib/ace/lib/dom.js#L37
> 4. expression about assign function to LHS
>   `^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)`
>    example:
>    1. https://github.com/ajaxorg/ace/blob/94422a4a892495564c56089af85019a8f8f24673/lib/ace/anchor.js#L102
>    2. https://github.com/ajaxorg/ace/blob/d95725983b363a616c584237013dfd36eaadbba4/lib/ace/lib/dom.js#L37
>    3. https://github.com/ajaxorg/ace/blob/4257621787b4253d6d493611f4ec5a37829da323/lib/ace/search.js#L350
>    4. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L299
>    
>    Maybe this should only match on the 0,1,2 indent level? Because JavaScript may match the function assignment in another function.
> 5. normal function in object literal
>   `^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)`
>     1. https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
>     2. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L242
> 6. function in class
>   `^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)`
>     
>    This regex is tricky because the class's function is no function keyword in JavaScript. 
>    If you write the regex to match them, it will match many non-function declaration things!!! 
>    Like examples below:
>    1. the non-function matches
>      1. https://github.com/ajaxorg/ace/blob/94422a4a892495564c56089af85019a8f8f24673/lib/ace/anchor.js#L58
>      2. https://github.com/ajaxorg/ace/blob/d95725983b363a616c584237013dfd36eaadbba4/lib/ace/lib/dom.js#L132
>    2. the function in class
>      1. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L1929
>      2. https://github.com/webpack/webpack/blob/ccecc17c01af96edddb931a76e7a3b21ef2969d8/lib/Chunk.js#L179
>      3. https://github.com/webpack/webpack/blob/612de998f186a9bb2fe8769a91678df689a0541e/lib/Module.js#L242
>      4. https://github.com/mozilla/pdf.js/blob/5cf116a958548f6596674bf8d5ca0fe64aa2df3c/web/view_history.js#L75
>      5. https://github.com/mozilla/pdf.js/blob/5cf116a958548f6596674bf8d5ca0fe64aa2df3c/web/view_history.js#L89
>    
>     My solution is to add some negation rules, and one rule is skipping the keywords that may have characters immediately 
>     following them in the parenthesis, rule is `!^[ \t]*(if|do|while|for|with|switch|catch|import|return)`.
>     Another negation rule is only before this 「function in class」 regex, that skips the line's indent level more than 
>     one because most of the function in class has one indent level(the class is on top-level). The negation rule is 
>     `!^(\t{2,}|[ ]{5,})`.
>     
>     But this is not enough, because maybe has function call on one indent level. So need an negation rule for skipping 
>     statement. The negation rule is `!^.*;[ \t]*`. But the bad news is JavaScript's statement can end without a semicolon. 
>     So it still has an opportunity to recognize function call as the function declaration if the code's statement does not 
>     end with semicolons.

The patterns need not be perfect, but "only" reasonably useful, i.e.,
suitable for the majority of the actual codebase. If it turns out that
too many lines match in practice, the patterns can be tweaked.

> 
> Word's pattern for JavaScript
> In this part, I reference the formal ECMA specification heavily[2].
> JavaScript has some special syntax, such as numbers can be separated with an underscore for readability[3]. 
> And has BigInt literal, which is number end with a 「n」 character. So the number-related regex would be some 
> differences with another language.
> 
> In the last, I had a fork git project on Github. And has the CI's result, the all test cases pass[4].
> 
> [1] https://github.com/webpack/webpack/blob/2279c5a2105ea1498b83a4854919aefe1a28c553/lib/ChunkGraph.js#L279
> [2] https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
> [3] https://v8.dev/features/numeric-separators
> [4] https://github.com/a97410985/git/actions/runs/1933091300
> 
> xing zhi jiang (1):
>   Add a diff driver for JavaScript languages.
> 
>  .gitignore                                    |  1 +
>  Documentation/gitattributes.txt               |  2 +
>  ...avascript-assignment-of-anonymous-function |  4 ++
>  .../javascript-assignment-of-arrow-function   |  4 ++
>  .../javascript-assignment-of-named-function   |  4 ++
>  t/t4018/javascript-async-function             |  4 ++
>  t/t4018/javascript-export-async-function      |  4 ++
>  t/t4018/javascript-export-function            |  4 ++
>  t/t4018/javascript-exports-anomyous-function  |  4 ++
>  .../javascript-exports-anomyous-function-2    |  4 ++
>  t/t4018/javascript-exports-function           |  4 ++
>  t/t4018/javascript-function                   |  4 ++
>  t/t4018/javascript-function-2                 | 10 ++++
>  t/t4018/javascript-function-belong-to-IIFE    |  6 +++
>  t/t4018/javascript-function-in-class          |  6 +++
>  t/t4018/javascript-function-in-class-2        | 11 ++++
>  t/t4018/javascript-function-in-object-literal |  7 +++
>  t/t4018/javascript-generator-function         |  4 ++
>  t/t4018/javascript-generator-function-2       |  4 ++
>  t/t4018/javascript-getter-function-in-class   |  6 +++
>  t/t4018/javascript-setter-function-in-class   |  6 +++
>  .../javascript-skip-function-call-statement   |  7 +++
>  t/t4018/javascript-skip-keywords              | 34 ++++++++++++
>  t/t4018/javascript-static-function-in-class   |  6 +++
>  t/t4034-diff-words.sh                         |  1 +
>  t/t4034/javascript/expect                     | 52 +++++++++++++++++++
>  t/t4034/javascript/post                       | 32 ++++++++++++
>  t/t4034/javascript/pre                        | 32 ++++++++++++
>  userdiff.c                                    | 38 ++++++++++++++
>  29 files changed, 305 insertions(+)
>  create mode 100644 t/t4018/javascript-assignment-of-anonymous-function
>  create mode 100644 t/t4018/javascript-assignment-of-arrow-function
>  create mode 100644 t/t4018/javascript-assignment-of-named-function
>  create mode 100644 t/t4018/javascript-async-function
>  create mode 100644 t/t4018/javascript-export-async-function
>  create mode 100644 t/t4018/javascript-export-function
>  create mode 100644 t/t4018/javascript-exports-anomyous-function
>  create mode 100644 t/t4018/javascript-exports-anomyous-function-2
>  create mode 100644 t/t4018/javascript-exports-function
>  create mode 100644 t/t4018/javascript-function
>  create mode 100644 t/t4018/javascript-function-2
>  create mode 100644 t/t4018/javascript-function-belong-to-IIFE
>  create mode 100644 t/t4018/javascript-function-in-class
>  create mode 100644 t/t4018/javascript-function-in-class-2
>  create mode 100644 t/t4018/javascript-function-in-object-literal
>  create mode 100644 t/t4018/javascript-generator-function
>  create mode 100644 t/t4018/javascript-generator-function-2
>  create mode 100644 t/t4018/javascript-getter-function-in-class
>  create mode 100644 t/t4018/javascript-setter-function-in-class
>  create mode 100644 t/t4018/javascript-skip-function-call-statement
>  create mode 100644 t/t4018/javascript-skip-keywords
>  create mode 100644 t/t4018/javascript-static-function-in-class
>  create mode 100644 t/t4034/javascript/expect
>  create mode 100644 t/t4034/javascript/post
>  create mode 100644 t/t4034/javascript/pre
> 
> 
> base-commit: e6ebfd0e8cbbd10878070c8a356b5ad1b3ca464e

-- Hannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages.
  2022-03-05 10:16   ` Johannes Sixt
@ 2022-03-07 15:10     ` xing-zhi jiang
  2022-03-08  6:46       ` Johannes Sixt
  0 siblings, 1 reply; 25+ messages in thread
From: xing-zhi jiang @ 2022-03-07 15:10 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Thanks for your review and give much good advice !!!
I will apply much of your advice to my v2 patch. But before I write
the v2 patch,
there is still something to check and discuss.

On Sat, 5 Mar 2022 at 18:16, Johannes Sixt <j6t@kdbg.org> wrote:
>
> Thank you for your contribution!
>
> Am 04.03.22 um 14:08 schrieb xing zhi jiang:
> > In the xfunction part that matches normal functions,
> > a variable declaration with an assignment of function, the function declaration
> > in the class, and also the function is object literal's property.
>
> On the first read, I stumbled over the last half-sentence. Is this
> JavaScript-lingo for the construct
>
>    x = { foo: function() { ... } }
>
> A parenthetical note in this regard would be helpful.
>
> > And in the word regex part, that matches numbers, punctuations, and also the
> > JavaScript identifier.
> > This part heavily references the formal ECMA sepcification[1].
> >
> > [1]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
>
> After having seen the regex, to follow the syntax specification is
> unnecessarily tight. If you follow my suggestions, the note should
> probably be rewritten, but keeping the link to the language reference is
> certainly helpful.
OK, I will adjust the commit message but remain the reference link.

> >
> > Signed-off-by: xing zhi jiang <a97410985new@gmail.com>
> > ---
> >  .gitignore                                    |  1 +
> >  Documentation/gitattributes.txt               |  2 +
> >  ...avascript-assignment-of-anonymous-function |  4 ++
> >  .../javascript-assignment-of-arrow-function   |  4 ++
> >  .../javascript-assignment-of-named-function   |  4 ++
> >  t/t4018/javascript-async-function             |  4 ++
> >  t/t4018/javascript-export-async-function      |  4 ++
> >  t/t4018/javascript-export-function            |  4 ++
> >  t/t4018/javascript-exports-anomyous-function  |  4 ++
> >  .../javascript-exports-anomyous-function-2    |  4 ++
> >  t/t4018/javascript-exports-function           |  4 ++
> >  t/t4018/javascript-function                   |  4 ++
> >  t/t4018/javascript-function-2                 | 10 ++++
> >  t/t4018/javascript-function-belong-to-IIFE    |  6 +++
> >  t/t4018/javascript-function-in-class          |  6 +++
> >  t/t4018/javascript-function-in-class-2        | 11 ++++
> >  t/t4018/javascript-function-in-object-literal |  7 +++
> >  t/t4018/javascript-generator-function         |  4 ++
> >  t/t4018/javascript-generator-function-2       |  4 ++
> >  t/t4018/javascript-getter-function-in-class   |  6 +++
> >  t/t4018/javascript-setter-function-in-class   |  6 +++
> >  .../javascript-skip-function-call-statement   |  7 +++
> >  t/t4018/javascript-skip-keywords              | 34 ++++++++++++
> >  t/t4018/javascript-static-function-in-class   |  6 +++
> >  t/t4034-diff-words.sh                         |  1 +
> >  t/t4034/javascript/expect                     | 52 +++++++++++++++++++
> >  t/t4034/javascript/post                       | 32 ++++++++++++
> >  t/t4034/javascript/pre                        | 32 ++++++++++++
> >  userdiff.c                                    | 38 ++++++++++++++
> >  29 files changed, 305 insertions(+)
> >  create mode 100644 t/t4018/javascript-assignment-of-anonymous-function
> >  create mode 100644 t/t4018/javascript-assignment-of-arrow-function
> >  create mode 100644 t/t4018/javascript-assignment-of-named-function
> >  create mode 100644 t/t4018/javascript-async-function
> >  create mode 100644 t/t4018/javascript-export-async-function
> >  create mode 100644 t/t4018/javascript-export-function
> >  create mode 100644 t/t4018/javascript-exports-anomyous-function
> >  create mode 100644 t/t4018/javascript-exports-anomyous-function-2
> >  create mode 100644 t/t4018/javascript-exports-function
> >  create mode 100644 t/t4018/javascript-function
> >  create mode 100644 t/t4018/javascript-function-2
> >  create mode 100644 t/t4018/javascript-function-belong-to-IIFE
> >  create mode 100644 t/t4018/javascript-function-in-class
> >  create mode 100644 t/t4018/javascript-function-in-class-2
> >  create mode 100644 t/t4018/javascript-function-in-object-literal
> >  create mode 100644 t/t4018/javascript-generator-function
> >  create mode 100644 t/t4018/javascript-generator-function-2
> >  create mode 100644 t/t4018/javascript-getter-function-in-class
> >  create mode 100644 t/t4018/javascript-setter-function-in-class
> >  create mode 100644 t/t4018/javascript-skip-function-call-statement
> >  create mode 100644 t/t4018/javascript-skip-keywords
> >  create mode 100644 t/t4018/javascript-static-function-in-class
> >  create mode 100644 t/t4034/javascript/expect
> >  create mode 100644 t/t4034/javascript/post
> >  create mode 100644 t/t4034/javascript/pre
> >
> > diff --git a/.gitignore b/.gitignore
> > index f817c509ec..de628c53f8 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -244,3 +244,4 @@ Release/
> >  /git.VC.db
> >  *.dSYM
> >  /contrib/buildsystems/out
> > +/.cache
> > \ No newline at end of file
>
> Do not include this change. It does not belong to this commit.
OK

> > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> > index 60984a4682..f6554a4651 100644
> > --- a/Documentation/gitattributes.txt
> > +++ b/Documentation/gitattributes.txt
> > @@ -828,6 +828,8 @@ patterns are available:
> >
> >  - `java` suitable for source code in the Java language.
> >
> > +- `javascript suitable for source code in the JavaScript language.
>
> Please do not forget the closing quote.
I will fix it in v2 patch.

> > +
> >  - `markdown` suitable for Markdown documents.
> >
> >  - `matlab` suitable for source code in the MATLAB and Octave languages.
> > diff --git a/t/t4018/javascript-assignment-of-anonymous-function b/t/t4018/javascript-assignment-of-anonymous-function
> > new file mode 100644
> > index 0000000000..5d4701e84c
> > --- /dev/null
> > +++ b/t/t4018/javascript-assignment-of-anonymous-function
> > @@ -0,0 +1,4 @@
> > +const RIGHT = function (a, b) {
> > +
> > +    return a + b; // ChangeMe
> > +};
> > \ No newline at end of file
>
> Notice this "No newline at end of file". Please complete the last line
> of the file, i.e. do add the last line break. Same for all other new
> files introduced in this patch.
I will fix it in v2 patch.

> > diff --git a/t/t4018/javascript-assignment-of-arrow-function b/t/t4018/javascript-assignment-of-arrow-function
> > new file mode 100644
> > index 0000000000..6ab73ccb7a
> > --- /dev/null
> > +++ b/t/t4018/javascript-assignment-of-arrow-function
> > @@ -0,0 +1,4 @@
> > +const RIGHT = (a, b) => {
> > +
> > +    return a + b; // ChangeMe
> > +};
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-assignment-of-named-function b/t/t4018/javascript-assignment-of-named-function
> > new file mode 100644
> > index 0000000000..85d43123a6
> > --- /dev/null
> > +++ b/t/t4018/javascript-assignment-of-named-function
> > @@ -0,0 +1,4 @@
> > +const RIGHT = function test (a, b) {
> > +
> > +    return a + b; // ChangeMe
> > +};
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-async-function b/t/t4018/javascript-async-function
> > new file mode 100644
> > index 0000000000..ebc7c8c05b
> > --- /dev/null
> > +++ b/t/t4018/javascript-async-function
> > @@ -0,0 +1,4 @@
> > +async function RIGHT(a, b) {
> > +
> > +  return a + b; // ChangeMe
> > +}
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-export-async-function b/t/t4018/javascript-export-async-function
> > new file mode 100644
> > index 0000000000..3cd60b7980
> > --- /dev/null
> > +++ b/t/t4018/javascript-export-async-function
> > @@ -0,0 +1,4 @@
> > +export async function RIGHT(a, b) {
> > +
> > +  return a + b; // ChangeMe
> > +}
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-export-function b/t/t4018/javascript-export-function
> > new file mode 100644
> > index 0000000000..177ddec779
> > --- /dev/null
> > +++ b/t/t4018/javascript-export-function
> > @@ -0,0 +1,4 @@
> > +export function RIGHT(a, b) {
> > +
> > +  return a + b; // ChangeMe
> > +}
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-exports-anomyous-function b/t/t4018/javascript-exports-anomyous-function
> > new file mode 100644
> > index 0000000000..45b0ecd659
> > --- /dev/null
> > +++ b/t/t4018/javascript-exports-anomyous-function
> > @@ -0,0 +1,4 @@
> > +exports.setFlagged = RIGHT => {
> > +
> > +    return ChangeMe;
> > +};
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-exports-anomyous-function-2 b/t/t4018/javascript-exports-anomyous-function-2
> > new file mode 100644
> > index 0000000000..0c572bfde3
> > --- /dev/null
> > +++ b/t/t4018/javascript-exports-anomyous-function-2
> > @@ -0,0 +1,4 @@
> > +exports.RIGHT = (a, b, runtime) => {
> > +
> > +    return ChangeMe;
> > +};
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-exports-function b/t/t4018/javascript-exports-function
> > new file mode 100644
> > index 0000000000..f1587fddac
> > --- /dev/null
> > +++ b/t/t4018/javascript-exports-function
> > @@ -0,0 +1,4 @@
> > +exports.RIGHT = function(document) {
> > +
> > +    return ChangeMe
> > +}
>
> Not a particularly important point, but the correct syntax requires a
> semicolon here, I guess.
>
Yes, I forgot the semicolon here.

> > +                                     delete $installedChunks$[chunkId];
> > +                             });
> > +                             currentUpdateRemovedChunks = ChangeMe;
> > +   }
> > +  }
> > +}
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-function-belong-to-IIFE b/t/t4018/javascript-function-belong-to-IIFE
> > new file mode 100644
> > index 0000000000..6a80118e8a
> > --- /dev/null
> > +++ b/t/t4018/javascript-function-belong-to-IIFE
> > @@ -0,0 +1,6 @@
> > +(function () {
> > +  this.$RIGHT = function (needle, modifier) {
> > +      let a = 5;
> > +      return ChangeMe;
> > +  };
> > +}).call(aaaa.prototype);
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-function-in-class b/t/t4018/javascript-function-in-class
> > new file mode 100644
> > index 0000000000..4b2f9c37e0
> > --- /dev/null
> > +++ b/t/t4018/javascript-function-in-class
> > @@ -0,0 +1,6 @@
> > +class Test {
> > +  RIGHT() {
> > +    let a = 4;
> > +    let b = ChangeMe;
> > +  }
> > +}
> > \ No newline at end of file
> > diff --git a/t/t4018/javascript-function-in-class-2 b/t/t4018/javascript-function-in-class-2
> > new file mode 100644
> > index 0000000000..402c4aecc3
> > --- /dev/null
> > +++ b/t/t4018/javascript-function-in-class-2
> > @@ -0,0 +1,11 @@
> > +class Test {
> > +  RIGHT(
> > +      aaaaaaaaaa,
> > +      bbbbbbbbbb,
> > +      cccccccccc,
> > +      dddddddddd
> > +  ) {
> > +    let a = 4;
> > +    let b = ChangeMe;
> > +  }
> > +}
> > \ No newline at end of file
>
> In the above two, we see class member functions. Is there a test case
> where the function parameter is on the same line as the function name or
> is that one of the difficult cases?
I can't get what you mean, but the second Test case wants to make sure we
can match the function with a long parameter list, and its parameters are
formatted to multiline.

> > diff --git a/t/t4034/javascript/post b/t/t4034/javascript/post
> > new file mode 100644
> > index 0000000000..8bc3e3af12
> > --- /dev/null
> > +++ b/t/t4034/javascript/post
> > @@ -0,0 +1,32 @@
> > +// DecimalLiteral
> > +124
> > +0.124
> > +0.123e-5
> > +0.123E-5
> > +0.123E5
> > +12222222222222222233344445n
> > +// HexIntegerLiteral
> > +0x11
> > +0X5Fa1
> > +0x123_756
> > +0x1234182989812f1289bn
> > +// OctalIntegerLiteral
> > +06
> > +0o5
> > +0O4
> > +0511_567
> > +0o424242424242424242424242424242666667n
> > +// BinaryIntegerLiteral
> > +0b1101
> > +0B0010
> > +0b0001_1101_0011
> > +0b11111111111111000011111111111111111n
> > +// punctuations
> > +{b} (b)
> > +b;
> > +[2,3]
> > +[ ...params_v2 ]
> > +a=2 a=2 a=2 a=2 a=2 a=2 a=2 a=2
> > +a-=b a%=b a&&=b a|=b
> > +b-c a++ a<<b a>>>=b a<<=b
> > +a&b a|b a??=b
> > \ No newline at end of file
> > diff --git a/t/t4034/javascript/pre b/t/t4034/javascript/pre
> > new file mode 100644
> > index 0000000000..b72238f779
> > --- /dev/null
> > +++ b/t/t4034/javascript/pre
> > @@ -0,0 +1,32 @@
> > +// DecimalLiteral
> > +123
> > +0.123
> > +0.123e+5
> > +0.123E+5
> > +0.123e5
> > +1222222222222222223334444n
> > +// HexIntegerLiteral
> > +0x10
> > +0X6Fa1
> > +0x123_456
> > +0x1234182989812f1289an
> > +// OctalIntegerLiteral
> > +05
> > +0o6
> > +0O7
> > +0512_567
> > +0o424242424242424242424242424242666666n
> > +// BinaryIntegerLiteral
> > +0b1001
> > +0B0110
> > +0b0001_1001_0011
> > +0b1111111111111111111111111111111111111n
> > +// punctuations
> > +{a} (a)
> > +a;
> > +[1,2]
> > +[ 1, 2, ...params ]
> > +a<=2 a>=2 a==2 a!=2 a===2 a!==2 a^=2 a=>2
> > +a+=b a*=b a**=b a||=b
> > +b+c a-- a>>b a>>>b a>>=b
> > +a&&b a||b a&&=b
> > \ No newline at end of file
> > diff --git a/userdiff.c b/userdiff.c
> > index 8578cb0d12..a6a341e3c1 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -168,6 +168,44 @@ PATTERNS("java",
> >        "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
> >        "|[-+*/<>%&^|=!]="
> >        "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> > +
> > +PATTERNS("javascript",
> > +      /* don't match the expression may contain parenthesis, because it is not a function declaration */
> > +      "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
>
> These will not match
>
>         }while (expr)
>
> note the absent blank before the keyword, but that is an acceptable
> trade-off to keep things simple. Good.
>
I thought not to match with `}while (expr)` is OK because this rule is
mainly to prevent matching too many
 wrong things about the regex for 「the function in the class」.
And before keywords maybe need blanks, because some javascript
formatting style is using space
instead of tab.


> > +      /* don't match statement */
> > +      "!^.*;[ \t]*\n"
>
> This regexp can be reduced to
>
>         "!;\n"
>
good advice, I would apply it.

> > +      /* match normal function */
> > +      "^[\t ]*((export[\t ]+)?(async[\t ]+)?function[\t ]*([\t ]*\\*[\t ]*|[\t ]*)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
>
> Good. One note though: keyword "function" can optionally be followed by
> an asterisk '*'. You can probably simplify the middle part to
>
>         ...function[\t *]*...identifier...
good advice, I would apply it.

> > +      /* match JavaScript variable declaration with a lambda expression */
> > +      "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> > +      "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*[\t ])[\t ]*=>[\t ]*\\{?)\n"
>
> Let's break this down:
>
>         "^[\t ]*"
>             "("
>                 "(const|let|var)[\t ]*"
>                 "[$_[:alpha:]][$_[:alnum:]]*[\t ]*"
>                 "=[\t ]*"
>                 "("
>                     "\\(.*\\)"
>                     "|"
>                     "[$_[:alpha:]][$_[:alnum:]]*[\t ]"
>                 ")[\t ]*"
>                 "=>[\t ]*"
>                 "\\{?"
>             ")\n"
>
> Can you not have
>
>         var f = foo=>{
>
> because I see that whitespace is required between the identifier and "=>"?
It is my mistake. I should not match tab or space between identifier and =>.
Thanks for pointing it out. I will fix it.

> > +      /* match exports for anonymous fucntion */
> > +      "^[\t ]*(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
>
> Here, whitespace is not required. Is the above an oversight?
>
> BTW, can keyword "exports" be used for something other than functions?
>
「Whitespace is not required」 is only referred to 「^[\t ]*(exports」? I
would remove the [\t ]* before exports
keyword because exports must be top-level.
The exports keyword can be used for exporting many things, such as
string. ex: exports.SimpleMessage = 'Hello world';
, So I define a more precise regex for only matching anonymous functions.

> > +      /* match assign function to LHS */
> > +      "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"
>
> This makes me think that whenever the keyword "function" appears, then we
> see the beginning of a function. This would allow a simple pattern
> upfront that picks out all functions defined with this keyword, and all
> other patterns need only be concerned with the exceptional cases.
>
>         /* "function" is first non-space token */
>         "^[\t ]*function[\t ].*)\n"
>         /* "function" is not first token */
>         "^.*[^$_[:alnum:]]function[\t ].*\n"
This is good advice, I would try.

> > +      /* match normal function in object literal */
> > +      "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n"
> > +      /* don't match the function in class, which has more than one ident level */
> > +      "!^(\t{2,}|[ ]{5,})\n"
>
> For some, hopefully universally agreed upon in the JavaScript community,
> definition of indentation level. Ok...
I had searched the web, the javascript's style about indentation level
has two kinds -
ident with 2 or 4 spaces. So I don't match the line, which begins with
more than four
spaces. If javascript must end with a semicolon, it doesn't need this rule :(

> > +      /* match function in class */
> > +      "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",
>
> Aren't "get" and "set" as universal identifiers of functions or can they
> occur on other contexts? Thinking of it, they can occur in comments
> everywhere, so they would pick up too many matches if treated like my
> "function" proposal above.
"get" and "set" are not javascript keywords, we could write code like
"let get = 5;"
In my view, it wouldn't match the comment, because the comment has "//" or "/*.
But I would think about it.

> > +      /* word regex */
> > +      /* hexIntegerLiteral and bigHexIntegerLiteral*/
> > +      "0[xX][0-9a-fA-F][_0-9a-fA-F]*n?|"
> > +      /* octalIntegerLiteral and bigOctalIntegerLiteral */
> > +      "0[oO]?[0-7][_0-7]*n?|"
> > +      /* binaryIntegerLiteral and bigBinaryIntegerLiteral */
> > +      "(0[bB][01][_01]*n?)|"
> > +      /* decimalLiteral */
> > +      "(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?|"
> > +      /* bigDecimalLiteral */
> > +      "(0|[1-9][_0-9]*)n|"
>
> You do not have to make the word-regex so tight that it excludes
> incorrect literals because you can assume that incorrect literals will
> not occur. In particular integers beginning with a 0 need not be treated
> specially. You can fold the octal and decimal integers into a single
> expression:
>
>         "[0-9][oO]?[_0-9.]*)([eE][+-]?[_0-9]+)?n?"
>
> Are floatingpoint literals beginning with a decimal point like .5 not
> permitted?
This is permitted. ex: "let a = .5" is valid.
And I would follow your advice to combine multiple regexes.

> Please follow the custom to place the alternation character "|" at the
> beginning of the next line, not at the end of the previous.
OK.

> > +      /* punctuations */
> > +      "\\{|\\}|\\(|\\)|\\.|\\.{3}|;|,|<|>|<=|>=|==|!=|={3}|!==|\\+|-|\\*|/|%|\\*{2}|"
> > +      "\\+{2}|--|<<|>>|>>>|&|\\||\\^|!|~|&&|\\|{2}|\\?{1,2}|:|=|\\+=|-=|\\*=|%=|\\*{2}=|"
> > +      "<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>|"
>
> You could collaps many of the operators into single alternatives, but if
> you prefer it this way, it is fine, too. One example:
>
>         ">{1,3}="
>
> But please remove the single-character operators from the list, because
> there is an implicit single-character alternative that is not visible in
> the code here.
>
> > +      /* identifiers */
> > +      "[$_[:alpha:]][$_[:alnum:]]*"),
> >  PATTERNS("markdown",
> >        "^ {0,3}#{1,6}[ \t].*",
> >        /* -- */
>
> -- Hannes
OK.

In the end, I am very much grateful for your review.

Best regards.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages.
  2022-03-07 15:10     ` xing-zhi jiang
@ 2022-03-08  6:46       ` Johannes Sixt
  2022-03-12 16:59         ` xing zhi jiang
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Sixt @ 2022-03-08  6:46 UTC (permalink / raw)
  To: xing-zhi jiang; +Cc: git

Am 07.03.22 um 16:10 schrieb xing-zhi jiang:
> On Sat, 5 Mar 2022 at 18:16, Johannes Sixt <j6t@kdbg.org> wrote:
>> Am 04.03.22 um 14:08 schrieb xing zhi jiang:
>>> diff --git a/t/t4018/javascript-function-in-class-2 b/t/t4018/javascript-function-in-class-2
>>> new file mode 100644
>>> index 0000000000..402c4aecc3
>>> --- /dev/null
>>> +++ b/t/t4018/javascript-function-in-class-2
>>> @@ -0,0 +1,11 @@
>>> +class Test {
>>> +  RIGHT(
>>> +      aaaaaaaaaa,
>>> +      bbbbbbbbbb,
>>> +      cccccccccc,
>>> +      dddddddddd
>>> +  ) {
>>> +    let a = 4;
>>> +    let b = ChangeMe;
>>> +  }
>>> +}
>>> \ No newline at end of file
>>
>> In the above two, we see class member functions. Is there a test case
>> where the function parameter is on the same line as the function name or
>> is that one of the difficult cases?
> I can't get what you mean, but the second Test case wants to make sure we
> can match the function with a long parameter list, and its parameters are
> formatted to multiline.

I meant a multi-line function call like so, where the first argument is
on the same line with the function name:

  RIGHT(aaaaaaaaaa,
      bbbbbbbbbb,
      cccccccccc,
      dddddddddd
  ) {
  ...

>>> diff --git a/userdiff.c b/userdiff.c
>>> index 8578cb0d12..a6a341e3c1 100644
>>> --- a/userdiff.c
>>> +++ b/userdiff.c
>>> @@ -168,6 +168,44 @@ PATTERNS("java",
>>>        "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>>>        "|[-+*/<>%&^|=!]="
>>>        "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
>>> +
>>> +PATTERNS("javascript",
>>> +      /* don't match the expression may contain parenthesis, because it is not a function declaration */
>>> +      "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
>>
>> These will not match
>>
>>         }while (expr)
>>
>> note the absent blank before the keyword, but that is an acceptable
>> trade-off to keep things simple. Good.
>>
> I thought not to match with `}while (expr)` is OK because this rule is
> mainly to prevent matching too many
>  wrong things about the regex for 「the function in the class」.
> And before keywords maybe need blanks, because some javascript
> formatting style is using space
> instead of tab.

I don't recall why I emphasize the absent blank between '}' and "while".
I must have misread something. At any rate, the regex would not match,
either, if there is whitespace between the two, so, it is totally OK.

>> Can you not have
>>
>>         var f = foo=>{
>>
>> because I see that whitespace is required between the identifier and "=>"?
> It is my mistake. I should not match tab or space between identifier and =>.
> Thanks for pointing it out. I will fix it.
> 
>>> +      /* match exports for anonymous fucntion */
>>> +      "^[\t ]*(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
>>
>> Here, whitespace is not required. Is the above an oversight?
>>
>> BTW, can keyword "exports" be used for something other than functions?
>>
> 「Whitespace is not required」 is only referred to 「^[\t ]*(exports」?

No, I refer to the whitespace around "=>" that is not required here, but
was required in the previous expression.

> I
> would remove the [\t ]* before exports
> keyword because exports must be top-level.
> The exports keyword can be used for exporting many things, such as
> string. ex: exports.SimpleMessage = 'Hello world';
> , So I define a more precise regex for only matching anonymous functions.

Understood. "exports" is a keyword only in certain contexts. Since it
can be used as (or like?) a variable name, it cannot be used as match
that always identifies a function header.

-- Hannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-04 13:08 [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language xing zhi jiang
  2022-03-04 13:08 ` [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages xing zhi jiang
  2022-03-05 13:41 ` [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language Johannes Sixt
@ 2022-03-12 16:48 ` xing zhi jiang
  2022-03-13 21:54   ` Johannes Sixt
  2022-03-14 17:20   ` Glen Choo
  2022-04-03 13:25 ` [GSoC][PATCH v3] " xing zhi jiang
  3 siblings, 2 replies; 25+ messages in thread
From: xing zhi jiang @ 2022-03-12 16:48 UTC (permalink / raw)
  To: git; +Cc: j6t

In the xfunction part that matches normal functions,
a variable declaration with an assignment of function, the function declaration
in the class, and also the function is object literal's property[1].

And in the word regex part, that matches numbers, punctuations, and also the
JavaScript identifier.
This part reference the formal ECMA specification[2].

[1]https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
[2]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar

Signed-off-by: xing zhi jiang <a97410985new@gmail.com>
---
Range-diff against v1:
1:  7f764f97cf ! 1:  3b326bd2b6 Add a diff driver for JavaScript languages.
    @@ Commit message
     
         In the xfunction part that matches normal functions,
         a variable declaration with an assignment of function, the function declaration
    -    in the class, and also the function is object literal's property.
    +    in the class, and also the function is object literal's property[1].
     
         And in the word regex part, that matches numbers, punctuations, and also the
         JavaScript identifier.
    -    This part heavily references the formal ECMA sepcification[1].
    +    This part reference the formal ECMA specification[2].
     
    -    [1]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
    +    [1]https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
    +    [2]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
     
         Signed-off-by: xing zhi jiang <a97410985new@gmail.com>
     
    - ## .gitignore ##
    -@@ .gitignore: Release/
    - /git.VC.db
    - *.dSYM
    - /contrib/buildsystems/out
    -+/.cache
    - \ No newline at end of file
    -
      ## Documentation/gitattributes.txt ##
     @@ Documentation/gitattributes.txt: patterns are available:
      
      - `java` suitable for source code in the Java language.
      
    -+- `javascript suitable for source code in the JavaScript language.
    ++- `javascript` suitable for source code in the JavaScript language.
     +
      - `markdown` suitable for Markdown documents.
      
    @@ t/t4018/javascript-assignment-of-anonymous-function (new)
     +	
     +    return a + b; // ChangeMe
     +};
    - \ No newline at end of file
     
      ## t/t4018/javascript-assignment-of-arrow-function (new) ##
     @@
    @@ t/t4018/javascript-assignment-of-arrow-function (new)
     +	
     +    return a + b; // ChangeMe
     +};
    - \ No newline at end of file
    +
    + ## t/t4018/javascript-assignment-of-arrow-function-2 (new) ##
    +@@
    ++const RIGHT = (a, b)=>{
    ++	
    ++    return a + b; // ChangeMe
    ++};
    +
    + ## t/t4018/javascript-assignment-of-arrow-function-3 (new) ##
    +@@
    ++const RIGHT=test=>{
    ++	
    ++    return test + 1; // ChangeMe
    ++};
     
      ## t/t4018/javascript-assignment-of-named-function (new) ##
     @@
    @@ t/t4018/javascript-assignment-of-named-function (new)
     +	
     +    return a + b; // ChangeMe
     +};
    - \ No newline at end of file
     
      ## t/t4018/javascript-async-function (new) ##
     @@
    @@ t/t4018/javascript-async-function (new)
     +  
     +  return a + b; // ChangeMe
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-export-async-function (new) ##
     @@
    @@ t/t4018/javascript-export-async-function (new)
     +  
     +  return a + b; // ChangeMe
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-export-function (new) ##
     @@
    @@ t/t4018/javascript-export-function (new)
     +  
     +  return a + b; // ChangeMe
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-exports-anomyous-function (new) ##
     @@
    @@ t/t4018/javascript-exports-anomyous-function (new)
     +	
     +    return ChangeMe;
     +};
    - \ No newline at end of file
     
      ## t/t4018/javascript-exports-anomyous-function-2 (new) ##
     @@
    @@ t/t4018/javascript-exports-anomyous-function-2 (new)
     +	
     +    return ChangeMe;
     +};
    - \ No newline at end of file
     
      ## t/t4018/javascript-exports-function (new) ##
     @@
     +exports.RIGHT = function(document) {
     +    
    -+    return ChangeMe
    ++    return ChangeMe;
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-function (new) ##
     @@
    @@ t/t4018/javascript-function (new)
     +
     +  return a + b; // ChangeMe
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-function-2 (new) ##
     @@
    @@ t/t4018/javascript-function-2 (new)
     +   }
     +  }
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-function-belong-to-IIFE (new) ##
     @@
    @@ t/t4018/javascript-function-belong-to-IIFE (new)
     +      return ChangeMe;
     +  };
     +}).call(aaaa.prototype);
    - \ No newline at end of file
     
      ## t/t4018/javascript-function-in-class (new) ##
     @@
    @@ t/t4018/javascript-function-in-class (new)
     +    let b = ChangeMe;
     +  }
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-function-in-class-2 (new) ##
     @@
    @@ t/t4018/javascript-function-in-class-2 (new)
     +    let b = ChangeMe;
     +  }
     +}
    - \ No newline at end of file
    +
    + ## t/t4018/javascript-function-in-class-3 (new) ##
    +@@
    ++class Test {
    ++  RIGHT(aaaaaaaaaa,
    ++      bbbbbbbbbb,
    ++      cccccccccc,
    ++      dddddddddd
    ++  ) {
    ++    let a = 4;
    ++    let b = ChangeMe;
    ++  }
    ++}
     
      ## t/t4018/javascript-function-in-object-literal (new) ##
     @@
    @@ t/t4018/javascript-function-in-object-literal (new)
     +        return ChangeMe
     +    }
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-generator-function (new) ##
     @@
    @@ t/t4018/javascript-generator-function (new)
     +  
     +  return a + b; // ChangeMe
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-generator-function-2 (new) ##
     @@
    @@ t/t4018/javascript-generator-function-2 (new)
     +  
     +  return a + b; // ChangeMe
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-getter-function-in-class (new) ##
     @@
    @@ t/t4018/javascript-getter-function-in-class (new)
     +    let b = ChangeMe;
     +  }
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-setter-function-in-class (new) ##
     @@
    @@ t/t4018/javascript-setter-function-in-class (new)
     +    let b = ChangeMe;
     +  }
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-skip-function-call-statement (new) ##
     @@
    @@ t/t4018/javascript-skip-function-call-statement (new)
     +    let b = ChangeMe;
     +  }
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-skip-keywords (new) ##
     @@
    @@ t/t4018/javascript-skip-keywords (new)
     +
     +  return a + b; // ChangeMe
     +}
    - \ No newline at end of file
     
      ## t/t4018/javascript-static-function-in-class (new) ##
     @@
    @@ t/t4018/javascript-static-function-in-class (new)
     +    let b = ChangeMe;
     +  }
     +}
    - \ No newline at end of file
     
      ## t/t4034-diff-words.sh ##
     @@ t/t4034-diff-words.sh: test_language_driver dts
    @@ t/t4034-diff-words.sh: test_language_driver dts
      ## t/t4034/javascript/expect (new) ##
     @@
     +<BOLD>diff --git a/pre b/post<RESET>
    -+<BOLD>index b72238f..8bc3e3a 100644<RESET>
    ++<BOLD>index 18f4796..46f9b62 100644<RESET>
     +<BOLD>--- a/pre<RESET>
     +<BOLD>+++ b/post<RESET>
    -+<CYAN>@@ -1,32 +1,32 @@<RESET>
    ++<CYAN>@@ -1,33 +1,33 @@<RESET>
     +// DecimalLiteral<RESET>
     +<RED>123<RESET>
     +<RED>0.123<RESET>
    ++<RED>.123<RESET>
     +<RED>0.123e+5<RESET>
     +<RED>0.123E+5<RESET>
     +<RED>0.123e5<RESET>
     +<RED>1222222222222222223334444n<RESET><GREEN>124<RESET>
     +<GREEN>0.124<RESET>
    ++<GREEN>.124<RESET>
     +<GREEN>0.123e-5<RESET>
     +<GREEN>0.123E-5<RESET>
     +<GREEN>0.123E5<RESET>
    @@ t/t4034/javascript/post (new)
     +// DecimalLiteral
     +124
     +0.124
    ++.124
     +0.123e-5
     +0.123E-5
     +0.123E5
    @@ t/t4034/javascript/post (new)
     +a-=b a%=b a&&=b a|=b
     +b-c a++ a<<b a>>>=b a<<=b
     +a&b a|b a??=b
    - \ No newline at end of file
     
      ## t/t4034/javascript/pre (new) ##
     @@
     +// DecimalLiteral
     +123
     +0.123
    ++.123
     +0.123e+5
     +0.123E+5
     +0.123e5
    @@ t/t4034/javascript/pre (new)
     +a+=b a*=b a**=b a||=b
     +b+c a-- a>>b a>>>b a>>=b
     +a&&b a||b a&&=b
    - \ No newline at end of file
     
      ## userdiff.c ##
     @@ userdiff.c: PATTERNS("java",
    @@ userdiff.c: PATTERNS("java",
     +	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
     +	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
     +	 /* don't match statement */
    -+	 "!^.*;[ \t]*\n"
    ++	 "!;\n"
     +	 /* match normal function */
    -+	 "^[\t ]*((export[\t ]+)?(async[\t ]+)?function[\t ]*([\t ]*\\*[\t ]*|[\t ]*)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
    ++	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
     +	 /* match JavaScript variable declaration with a lambda expression */
     +	 "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
    -+	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*[\t ])[\t ]*=>[\t ]*\\{?)\n"
    ++	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
     +	 /* match exports for anonymous fucntion */
    -+	 "^[\t ]*(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
    ++	 "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
     +	 /* match assign function to LHS */
     +	 "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"
     +	 /* match normal function in object literal */
    @@ userdiff.c: PATTERNS("java",
     +	 /* match function in class */
     +	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",
     +	 /* word regex */
    -+	 /* hexIntegerLiteral and bigHexIntegerLiteral*/
    -+	 "0[xX][0-9a-fA-F][_0-9a-fA-F]*n?|"
    -+	 /* octalIntegerLiteral and bigOctalIntegerLiteral */
    -+	 "0[oO]?[0-7][_0-7]*n?|"
    -+	 /* binaryIntegerLiteral and bigBinaryIntegerLiteral */
    -+	 "(0[bB][01][_01]*n?)|"
    -+	 /* decimalLiteral */
    -+	 "(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?|"
    -+	 /* bigDecimalLiteral */
    -+	 "(0|[1-9][_0-9]*)n|"
    ++	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, DecimalLiteral and its big version */
    ++	 "(0[xXoObB])?[0-9a-fA-F][_0-9a-fA-F]*n?"
    ++	 /* DecimalLiteral may be float */
    ++	 "|(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
     +	 /* punctuations */
    -+	 "\\{|\\}|\\(|\\)|\\.|\\.{3}|;|,|<|>|<=|>=|==|!=|={3}|!==|\\+|-|\\*|/|%|\\*{2}|"
    -+	 "\\+{2}|--|<<|>>|>>>|&|\\||\\^|!|~|&&|\\|{2}|\\?{1,2}|:|=|\\+=|-=|\\*=|%=|\\*{2}=|"
    -+	 "<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>|"
    ++	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
    ++	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
    ++	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
     +	 /* identifiers */
    -+	 "[$_[:alpha:]][$_[:alnum:]]*"),
    ++	 "|[$_[:alpha:]][$_[:alnum:]]*"),
      PATTERNS("markdown",
      	 "^ {0,3}#{1,6}[ \t].*",
      	 /* -- */

 Documentation/gitattributes.txt               |  2 +
 ...avascript-assignment-of-anonymous-function |  4 ++
 .../javascript-assignment-of-arrow-function   |  4 ++
 .../javascript-assignment-of-arrow-function-2 |  4 ++
 .../javascript-assignment-of-arrow-function-3 |  4 ++
 .../javascript-assignment-of-named-function   |  4 ++
 t/t4018/javascript-async-function             |  4 ++
 t/t4018/javascript-export-async-function      |  4 ++
 t/t4018/javascript-export-function            |  4 ++
 t/t4018/javascript-exports-anomyous-function  |  4 ++
 .../javascript-exports-anomyous-function-2    |  4 ++
 t/t4018/javascript-exports-function           |  4 ++
 t/t4018/javascript-function                   |  4 ++
 t/t4018/javascript-function-2                 | 10 ++++
 t/t4018/javascript-function-belong-to-IIFE    |  6 +++
 t/t4018/javascript-function-in-class          |  6 +++
 t/t4018/javascript-function-in-class-2        | 11 ++++
 t/t4018/javascript-function-in-class-3        | 10 ++++
 t/t4018/javascript-function-in-object-literal |  7 +++
 t/t4018/javascript-generator-function         |  4 ++
 t/t4018/javascript-generator-function-2       |  4 ++
 t/t4018/javascript-getter-function-in-class   |  6 +++
 t/t4018/javascript-setter-function-in-class   |  6 +++
 .../javascript-skip-function-call-statement   |  7 +++
 t/t4018/javascript-skip-keywords              | 34 ++++++++++++
 t/t4018/javascript-static-function-in-class   |  6 +++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/javascript/expect                     | 54 +++++++++++++++++++
 t/t4034/javascript/post                       | 33 ++++++++++++
 t/t4034/javascript/pre                        | 33 ++++++++++++
 userdiff.c                                    | 32 +++++++++++
 31 files changed, 320 insertions(+)
 create mode 100644 t/t4018/javascript-assignment-of-anonymous-function
 create mode 100644 t/t4018/javascript-assignment-of-arrow-function
 create mode 100644 t/t4018/javascript-assignment-of-arrow-function-2
 create mode 100644 t/t4018/javascript-assignment-of-arrow-function-3
 create mode 100644 t/t4018/javascript-assignment-of-named-function
 create mode 100644 t/t4018/javascript-async-function
 create mode 100644 t/t4018/javascript-export-async-function
 create mode 100644 t/t4018/javascript-export-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function-2
 create mode 100644 t/t4018/javascript-exports-function
 create mode 100644 t/t4018/javascript-function
 create mode 100644 t/t4018/javascript-function-2
 create mode 100644 t/t4018/javascript-function-belong-to-IIFE
 create mode 100644 t/t4018/javascript-function-in-class
 create mode 100644 t/t4018/javascript-function-in-class-2
 create mode 100644 t/t4018/javascript-function-in-class-3
 create mode 100644 t/t4018/javascript-function-in-object-literal
 create mode 100644 t/t4018/javascript-generator-function
 create mode 100644 t/t4018/javascript-generator-function-2
 create mode 100644 t/t4018/javascript-getter-function-in-class
 create mode 100644 t/t4018/javascript-setter-function-in-class
 create mode 100644 t/t4018/javascript-skip-function-call-statement
 create mode 100644 t/t4018/javascript-skip-keywords
 create mode 100644 t/t4018/javascript-static-function-in-class
 create mode 100644 t/t4034/javascript/expect
 create mode 100644 t/t4034/javascript/post
 create mode 100644 t/t4034/javascript/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 60984a4682..a8e3e4d735 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -828,6 +828,8 @@ patterns are available:
 
 - `java` suitable for source code in the Java language.
 
+- `javascript` suitable for source code in the JavaScript language.
+
 - `markdown` suitable for Markdown documents.
 
 - `matlab` suitable for source code in the MATLAB and Octave languages.
diff --git a/t/t4018/javascript-assignment-of-anonymous-function b/t/t4018/javascript-assignment-of-anonymous-function
new file mode 100644
index 0000000000..b6f2ccccfc
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-anonymous-function
@@ -0,0 +1,4 @@
+const RIGHT = function (a, b) {
+	
+    return a + b; // ChangeMe
+};
diff --git a/t/t4018/javascript-assignment-of-arrow-function b/t/t4018/javascript-assignment-of-arrow-function
new file mode 100644
index 0000000000..24ce517b7a
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-arrow-function
@@ -0,0 +1,4 @@
+const RIGHT = (a, b) => {
+	
+    return a + b; // ChangeMe
+};
diff --git a/t/t4018/javascript-assignment-of-arrow-function-2 b/t/t4018/javascript-assignment-of-arrow-function-2
new file mode 100644
index 0000000000..bbf5de369e
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-arrow-function-2
@@ -0,0 +1,4 @@
+const RIGHT = (a, b)=>{
+	
+    return a + b; // ChangeMe
+};
diff --git a/t/t4018/javascript-assignment-of-arrow-function-3 b/t/t4018/javascript-assignment-of-arrow-function-3
new file mode 100644
index 0000000000..4a07aa3259
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-arrow-function-3
@@ -0,0 +1,4 @@
+const RIGHT=test=>{
+	
+    return test + 1; // ChangeMe
+};
diff --git a/t/t4018/javascript-assignment-of-named-function b/t/t4018/javascript-assignment-of-named-function
new file mode 100644
index 0000000000..bfc486ebef
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-named-function
@@ -0,0 +1,4 @@
+const RIGHT = function test (a, b) {
+	
+    return a + b; // ChangeMe
+};
diff --git a/t/t4018/javascript-async-function b/t/t4018/javascript-async-function
new file mode 100644
index 0000000000..993e6926bf
--- /dev/null
+++ b/t/t4018/javascript-async-function
@@ -0,0 +1,4 @@
+async function RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-export-async-function b/t/t4018/javascript-export-async-function
new file mode 100644
index 0000000000..fecbd669d7
--- /dev/null
+++ b/t/t4018/javascript-export-async-function
@@ -0,0 +1,4 @@
+export async function RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-export-function b/t/t4018/javascript-export-function
new file mode 100644
index 0000000000..b5acbb2b08
--- /dev/null
+++ b/t/t4018/javascript-export-function
@@ -0,0 +1,4 @@
+export function RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-exports-anomyous-function b/t/t4018/javascript-exports-anomyous-function
new file mode 100644
index 0000000000..6786cbda8d
--- /dev/null
+++ b/t/t4018/javascript-exports-anomyous-function
@@ -0,0 +1,4 @@
+exports.setFlagged = RIGHT => {
+	
+    return ChangeMe;
+};
diff --git a/t/t4018/javascript-exports-anomyous-function-2 b/t/t4018/javascript-exports-anomyous-function-2
new file mode 100644
index 0000000000..883569f40d
--- /dev/null
+++ b/t/t4018/javascript-exports-anomyous-function-2
@@ -0,0 +1,4 @@
+exports.RIGHT = (a, b, runtime) => {
+	
+    return ChangeMe;
+};
diff --git a/t/t4018/javascript-exports-function b/t/t4018/javascript-exports-function
new file mode 100644
index 0000000000..63b79f5991
--- /dev/null
+++ b/t/t4018/javascript-exports-function
@@ -0,0 +1,4 @@
+exports.RIGHT = function(document) {
+    
+    return ChangeMe;
+}
diff --git a/t/t4018/javascript-function b/t/t4018/javascript-function
new file mode 100644
index 0000000000..0cc0bf54e7
--- /dev/null
+++ b/t/t4018/javascript-function
@@ -0,0 +1,4 @@
+function RIGHT(a, b) {
+
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-function-2 b/t/t4018/javascript-function-2
new file mode 100644
index 0000000000..06cfb779f0
--- /dev/null
+++ b/t/t4018/javascript-function-2
@@ -0,0 +1,10 @@
+function test(a, b) {
+  return {
+			RIGHT: function () {
+				currentUpdateRemovedChunks.forEach(function (chunkId) {
+					delete $installedChunks$[chunkId];
+				});
+				currentUpdateRemovedChunks = ChangeMe;
+   }
+  }
+}
diff --git a/t/t4018/javascript-function-belong-to-IIFE b/t/t4018/javascript-function-belong-to-IIFE
new file mode 100644
index 0000000000..6e5fe858c0
--- /dev/null
+++ b/t/t4018/javascript-function-belong-to-IIFE
@@ -0,0 +1,6 @@
+(function () {
+  this.$RIGHT = function (needle, modifier) {
+      let a = 5;
+      return ChangeMe;
+  };
+}).call(aaaa.prototype);
diff --git a/t/t4018/javascript-function-in-class b/t/t4018/javascript-function-in-class
new file mode 100644
index 0000000000..0cc0a26612
--- /dev/null
+++ b/t/t4018/javascript-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-function-in-class-2 b/t/t4018/javascript-function-in-class-2
new file mode 100644
index 0000000000..725495fe55
--- /dev/null
+++ b/t/t4018/javascript-function-in-class-2
@@ -0,0 +1,11 @@
+class Test {
+  RIGHT(
+      aaaaaaaaaa,
+      bbbbbbbbbb,
+      cccccccccc,
+      dddddddddd
+  ) {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-function-in-class-3 b/t/t4018/javascript-function-in-class-3
new file mode 100644
index 0000000000..e9b20728b2
--- /dev/null
+++ b/t/t4018/javascript-function-in-class-3
@@ -0,0 +1,10 @@
+class Test {
+  RIGHT(aaaaaaaaaa,
+      bbbbbbbbbb,
+      cccccccccc,
+      dddddddddd
+  ) {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-function-in-object-literal b/t/t4018/javascript-function-in-object-literal
new file mode 100644
index 0000000000..021cc706dd
--- /dev/null
+++ b/t/t4018/javascript-function-in-object-literal
@@ -0,0 +1,7 @@
+const obj = {
+    RIGHT: function (elems, callback, arg) {
+        var length, value;
+        // ...
+        return ChangeMe
+    }
+}
diff --git a/t/t4018/javascript-generator-function b/t/t4018/javascript-generator-function
new file mode 100644
index 0000000000..dc7793939f
--- /dev/null
+++ b/t/t4018/javascript-generator-function
@@ -0,0 +1,4 @@
+function* RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-generator-function-2 b/t/t4018/javascript-generator-function-2
new file mode 100644
index 0000000000..950676a612
--- /dev/null
+++ b/t/t4018/javascript-generator-function-2
@@ -0,0 +1,4 @@
+function *RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-getter-function-in-class b/t/t4018/javascript-getter-function-in-class
new file mode 100644
index 0000000000..9a5aee39f7
--- /dev/null
+++ b/t/t4018/javascript-getter-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  get RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-setter-function-in-class b/t/t4018/javascript-setter-function-in-class
new file mode 100644
index 0000000000..dc5f288665
--- /dev/null
+++ b/t/t4018/javascript-setter-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  set RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-skip-function-call-statement b/t/t4018/javascript-skip-function-call-statement
new file mode 100644
index 0000000000..321993c27e
--- /dev/null
+++ b/t/t4018/javascript-skip-function-call-statement
@@ -0,0 +1,7 @@
+class Test {
+  static RIGHT() {
+    haha();
+    haha2()
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-skip-keywords b/t/t4018/javascript-skip-keywords
new file mode 100644
index 0000000000..5584970b58
--- /dev/null
+++ b/t/t4018/javascript-skip-keywords
@@ -0,0 +1,34 @@
+function RIGHT(a, b) {
+  import("./async1")
+  if (a > 1) {
+    // ...
+  }
+  do {
+    // ...
+  } while (i < 5);
+  for (const element of array1) {
+    console.log(element)
+  }
+  with(o) {
+    console.log(x)
+  }
+  switch (expr) {
+    case 'a':
+      // ...
+      break;
+    case 'b':
+      // ...
+      break;
+    default:
+      // ...
+  }
+  try {
+    // ...
+    return (a + c)
+  } 
+  catch (error) {
+    // ...
+  }
+
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-static-function-in-class b/t/t4018/javascript-static-function-in-class
new file mode 100644
index 0000000000..fbf0b7ca3d
--- /dev/null
+++ b/t/t4018/javascript-static-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  static RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index d5abcf4b4c..33073edeca 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -324,6 +324,7 @@ test_language_driver dts
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
+test_language_driver javascript
 test_language_driver matlab
 test_language_driver objc
 test_language_driver pascal
diff --git a/t/t4034/javascript/expect b/t/t4034/javascript/expect
new file mode 100644
index 0000000000..419d61903b
--- /dev/null
+++ b/t/t4034/javascript/expect
@@ -0,0 +1,54 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 18f4796..46f9b62 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,33 +1,33 @@<RESET>
+// DecimalLiteral<RESET>
+<RED>123<RESET>
+<RED>0.123<RESET>
+<RED>.123<RESET>
+<RED>0.123e+5<RESET>
+<RED>0.123E+5<RESET>
+<RED>0.123e5<RESET>
+<RED>1222222222222222223334444n<RESET><GREEN>124<RESET>
+<GREEN>0.124<RESET>
+<GREEN>.124<RESET>
+<GREEN>0.123e-5<RESET>
+<GREEN>0.123E-5<RESET>
+<GREEN>0.123E5<RESET>
+<GREEN>12222222222222222233344445n<RESET>
+// HexIntegerLiteral<RESET>
+<RED>0x10<RESET>
+<RED>0X6Fa1<RESET>
+<RED>0x123_456<RESET>
+<RED>0x1234182989812f1289an<RESET><GREEN>0x11<RESET>
+<GREEN>0X5Fa1<RESET>
+<GREEN>0x123_756<RESET>
+<GREEN>0x1234182989812f1289bn<RESET>
+// OctalIntegerLiteral<RESET>
+<RED>05<RESET>
+<RED>0o6<RESET>
+<RED>0O7<RESET>
+<RED>0512_567<RESET>
+<RED>0o424242424242424242424242424242666666n<RESET><GREEN>06<RESET>
+<GREEN>0o5<RESET>
+<GREEN>0O4<RESET>
+<GREEN>0511_567<RESET>
+<GREEN>0o424242424242424242424242424242666667n<RESET>
+// BinaryIntegerLiteral<RESET>
+<RED>0b1001<RESET>
+<RED>0B0110<RESET>
+<RED>0b0001_1001_0011<RESET>
+<RED>0b1111111111111111111111111111111111111n<RESET><GREEN>0b1101<RESET>
+<GREEN>0B0010<RESET>
+<GREEN>0b0001_1101_0011<RESET>
+<GREEN>0b11111111111111000011111111111111111n<RESET>
+// punctuations<RESET>
+{<RED>a<RESET><GREEN>b<RESET>} (<RED>a<RESET><GREEN>b<RESET>)
+<RED>a<RESET><GREEN>b<RESET>;
+[<RED>1,<RESET>2<GREEN>,3<RESET>]
+[<RED>1, 2,<RESET> ...<RED>params<RESET><GREEN>params_v2<RESET> ]
+a<RED><=<RESET><GREEN>=<RESET>2 a<RED>>=<RESET><GREEN>=<RESET>2 a<RED>==<RESET><GREEN>=<RESET>2 a<RED>!=<RESET><GREEN>=<RESET>2 a<RED>===<RESET><GREEN>=<RESET>2 a<RED>!==<RESET><GREEN>=<RESET>2 a<RED>^=<RESET><GREEN>=<RESET>2 a<RED>=><RESET><GREEN>=<RESET>2
+a<RED>+=<RESET><GREEN>-=<RESET>b a<RED>*=<RESET><GREEN>%=<RESET>b a<RED>**=<RESET><GREEN>&&=<RESET>b a<RED>||=<RESET><GREEN>|=<RESET>b
+b<RED>+<RESET><GREEN>-<RESET>c a<RED>--<RESET><GREEN>++<RESET> a<RED>>><RESET><GREEN><<<RESET>b a<RED>>>><RESET><GREEN>>>>=<RESET>b a<RED>>>=<RESET><GREEN><<=<RESET>b
+a<RED>&&<RESET><GREEN>&<RESET>b a<RED>||<RESET><GREEN>|<RESET>b a<RED>&&=<RESET><GREEN>??=<RESET>b
diff --git a/t/t4034/javascript/post b/t/t4034/javascript/post
new file mode 100644
index 0000000000..46f9b627e4
--- /dev/null
+++ b/t/t4034/javascript/post
@@ -0,0 +1,33 @@
+// DecimalLiteral
+124
+0.124
+.124
+0.123e-5
+0.123E-5
+0.123E5
+12222222222222222233344445n
+// HexIntegerLiteral
+0x11
+0X5Fa1
+0x123_756
+0x1234182989812f1289bn
+// OctalIntegerLiteral
+06
+0o5
+0O4
+0511_567
+0o424242424242424242424242424242666667n
+// BinaryIntegerLiteral
+0b1101
+0B0010
+0b0001_1101_0011
+0b11111111111111000011111111111111111n
+// punctuations
+{b} (b)
+b;
+[2,3]
+[ ...params_v2 ]
+a=2 a=2 a=2 a=2 a=2 a=2 a=2 a=2
+a-=b a%=b a&&=b a|=b
+b-c a++ a<<b a>>>=b a<<=b
+a&b a|b a??=b
diff --git a/t/t4034/javascript/pre b/t/t4034/javascript/pre
new file mode 100644
index 0000000000..18f479688c
--- /dev/null
+++ b/t/t4034/javascript/pre
@@ -0,0 +1,33 @@
+// DecimalLiteral
+123
+0.123
+.123
+0.123e+5
+0.123E+5
+0.123e5
+1222222222222222223334444n
+// HexIntegerLiteral
+0x10
+0X6Fa1
+0x123_456
+0x1234182989812f1289an
+// OctalIntegerLiteral
+05
+0o6
+0O7
+0512_567
+0o424242424242424242424242424242666666n
+// BinaryIntegerLiteral
+0b1001
+0B0110
+0b0001_1001_0011
+0b1111111111111111111111111111111111111n
+// punctuations
+{a} (a)
+a;
+[1,2]
+[ 1, 2, ...params ]
+a<=2 a>=2 a==2 a!=2 a===2 a!==2 a^=2 a=>2
+a+=b a*=b a**=b a||=b
+b+c a-- a>>b a>>>b a>>=b
+a&&b a||b a&&=b
diff --git a/userdiff.c b/userdiff.c
index 8578cb0d12..51bfe4021d 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -168,6 +168,38 @@ PATTERNS("java",
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
+
+PATTERNS("javascript",
+	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
+	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
+	 /* don't match statement */
+	 "!;\n"
+	 /* match normal function */
+	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
+	 /* match JavaScript variable declaration with a lambda expression */
+	 "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
+	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
+	 /* match exports for anonymous fucntion */
+	 "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
+	 /* match assign function to LHS */
+	 "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"
+	 /* match normal function in object literal */
+	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n"
+	 /* don't match the function in class, which has more than one ident level */
+	 "!^(\t{2,}|[ ]{5,})\n"
+	 /* match function in class */
+	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",
+	 /* word regex */
+	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, DecimalLiteral and its big version */
+	 "(0[xXoObB])?[0-9a-fA-F][_0-9a-fA-F]*n?"
+	 /* DecimalLiteral may be float */
+	 "|(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
+	 /* punctuations */
+	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
+	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
+	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
+	 /* identifiers */
+	 "|[$_[:alpha:]][$_[:alnum:]]*"),
 PATTERNS("markdown",
 	 "^ {0,3}#{1,6}[ \t].*",
 	 /* -- */
-- 
2.35.1.273.ge6ebfd0e8c.dirty


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages.
  2022-03-08  6:46       ` Johannes Sixt
@ 2022-03-12 16:59         ` xing zhi jiang
  0 siblings, 0 replies; 25+ messages in thread
From: xing zhi jiang @ 2022-03-12 16:59 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On Tue, 8 Mar 2022 at 14:47, Johannes Sixt <j6t@kdbg.org> wrote:
> I meant a multi-line function call like so, where the first argument is
> on the same line with the function name:
>
>  RIGHT(aaaaaaaaaa,
>      bbbbbbbbbb,
>     cccccccccc,
>    dddddddddd
> ) {
> ...

OK, I got it. I thought it is not a difficult case, but I add a new
test case for this.

>  <No, I refer to the whitespace around "=>" that is not required here, but
>  was required in the previous expression.

 I think the space is required, ex:
https://github.com/webpack/webpack/blob/c181294865dca01b28e6e316636fef5f2aad4eb6/lib/dependencies/DynamicExports.js#L17
. The link above it has space before 「=>」.

Best regards.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-12 16:48 ` [GSoC][PATCH v2] Add a diff driver for JavaScript languages xing zhi jiang
@ 2022-03-13 21:54   ` Johannes Sixt
  2022-04-03 13:17     ` xing zhi jiang
  2022-03-14 17:20   ` Glen Choo
  1 sibling, 1 reply; 25+ messages in thread
From: Johannes Sixt @ 2022-03-13 21:54 UTC (permalink / raw)
  To: xing zhi jiang; +Cc: git

When you send a new iteration of a patch or patch set, it is customary
on this list to include everyone who took part in the earlier rounds in
the Cc: list.

Am 12.03.22 um 17:48 schrieb xing zhi jiang:
> In the xfunction part that matches normal functions,
> a variable declaration with an assignment of function, the function declaration
> in the class, and also the function is object literal's property[1].
> 
> And in the word regex part, that matches numbers, punctuations, and also the
> JavaScript identifier.
> This part reference the formal ECMA specification[2].
> 
> [1]https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
> [2]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
> 
> Signed-off-by: xing zhi jiang <a97410985new@gmail.com>
> ---

> diff --git a/userdiff.c b/userdiff.c
> index 8578cb0d12..51bfe4021d 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -168,6 +168,38 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +
> +PATTERNS("javascript",
> +	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
> +	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
> +	 /* don't match statement */
> +	 "!;\n"
> +	 /* match normal function */
> +	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
> +	 /* match JavaScript variable declaration with a lambda expression */
> +	 "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> +	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"

It would help readability if this second line of this regex were
indented because it is a continuation of the first line.

> +	 /* match exports for anonymous fucntion */
> +	 "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
> +	 /* match assign function to LHS */
> +	 "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"

This should be written as

	 "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*)?\\(.*)\n"

Notice that the whitespace after the identifier can only appear when
there is actually an identifier. The point is to reduce the different
matches permitted by the sub-expression "[\t ]*[\t ]*" when there is no
identifier in the text.

Can the keyword function ever be followed by a number? I guess not. Then
[$_[:alpha:]][$_[:alnum:]]* could be reduced to [$_[:alnum:]]+

> +	 /* match normal function in object literal */
> +	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n"
> +	 /* don't match the function in class, which has more than one ident level */
> +	 "!^(\t{2,}|[ ]{5,})\n"
> +	 /* match function in class */
> +	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",> +	 /* word regex */
> +	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, DecimalLiteral and its big version */
> +	 "(0[xXoObB])?[0-9a-fA-F][_0-9a-fA-F]*n?"
> +	 /* DecimalLiteral may be float */
> +	 "|(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?"

Having alternatives that begin with an optional part make the regex
evaluation comparatively inefficient. In particular, both alternatives
above match a decimal integer. I suggest to have the first alternative
only for hex, octal, and binary integers, and the second for all decimal
numbers including floatingpoint:

	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and
their big versions */
	 "0[xXoObB][_0-9a-fA-F]+n?"
	 /* DecimalLiteral may be float */
	 "|[0-9][_0-9]*(\\.[_0-9]*|n)?([eE][+-]?[_0-9]+)?"

and if floating point literals can begin with a decimal point, then we
also need

	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"

> +	 /* punctuations */
> +	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
> +	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
> +	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
> +	 /* identifiers */
> +	 "|[$_[:alpha:]][$_[:alnum:]]*"),
>  PATTERNS("markdown",
>  	 "^ {0,3}#{1,6}[ \t].*",
>  	 /* -- */

-- Hannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-12 16:48 ` [GSoC][PATCH v2] Add a diff driver for JavaScript languages xing zhi jiang
  2022-03-13 21:54   ` Johannes Sixt
@ 2022-03-14 17:20   ` Glen Choo
  2022-03-15  7:40     ` Johannes Sixt
  2022-04-03 13:21     ` xing zhi jiang
  1 sibling, 2 replies; 25+ messages in thread
From: Glen Choo @ 2022-03-14 17:20 UTC (permalink / raw)
  To: xing zhi jiang, git; +Cc: j6t


Welcome to the Git project :) I never realized that we didn't include a
built in JS diff driver, so this would be quite welcome.

I'm not entirely familiar with this part of the system or our regex
style, so I won't comment on those. I'll only comment on the patterns we
are trying to detect and whether or not we want them.

I'm not 100% clear on the style around diff drivers e.g. how do we
decide that we want a pattern or not? I'd appreciate any pointers to
docs or commits.

xing zhi jiang <a97410985new@gmail.com> writes:

> diff --git a/t/t4018/javascript-assignment-of-anonymous-function b/t/t4018/javascript-assignment-of-anonymous-function
> new file mode 100644
> index 0000000000..b6f2ccccfc
> --- /dev/null
> +++ b/t/t4018/javascript-assignment-of-anonymous-function
> @@ -0,0 +1,4 @@
> +const RIGHT = function (a, b) {
> +	
> +    return a + b; // ChangeMe
> +};
> diff --git a/t/t4018/javascript-assignment-of-arrow-function b/t/t4018/javascript-assignment-of-arrow-function
> new file mode 100644
> index 0000000000..24ce517b7a
> --- /dev/null
> +++ b/t/t4018/javascript-assignment-of-arrow-function
> @@ -0,0 +1,4 @@
> +const RIGHT = (a, b) => {
> +	
> +    return a + b; // ChangeMe
> +};
> diff --git a/t/t4018/javascript-assignment-of-arrow-function-2 b/t/t4018/javascript-assignment-of-arrow-function-2
> new file mode 100644
> index 0000000000..bbf5de369e
> --- /dev/null
> +++ b/t/t4018/javascript-assignment-of-arrow-function-2
> @@ -0,0 +1,4 @@
> +const RIGHT = (a, b)=>{
> +	
> +    return a + b; // ChangeMe
> +};
> diff --git a/t/t4018/javascript-assignment-of-arrow-function-3 b/t/t4018/javascript-assignment-of-arrow-function-3
> new file mode 100644
> index 0000000000..4a07aa3259
> --- /dev/null
> +++ b/t/t4018/javascript-assignment-of-arrow-function-3
> @@ -0,0 +1,4 @@
> +const RIGHT=test=>{
> +	
> +    return test + 1; // ChangeMe
> +};

These are all variable assignments of anonymous functions, so we won't
'technically' be showing the function name in the diff, but as a
practical matter, they are _probably_ referred to by this name
consistently. So including them makes sense.

> diff --git a/t/t4018/javascript-assignment-of-named-function b/t/t4018/javascript-assignment-of-named-function
> new file mode 100644
> index 0000000000..bfc486ebef
> --- /dev/null
> +++ b/t/t4018/javascript-assignment-of-named-function
> @@ -0,0 +1,4 @@
> +const RIGHT = function test (a, b) {
> +	
> +    return a + b; // ChangeMe
> +};
> diff --git a/t/t4018/javascript-async-function b/t/t4018/javascript-async-function
> new file mode 100644
> index 0000000000..993e6926bf
> --- /dev/null
> +++ b/t/t4018/javascript-async-function
> @@ -0,0 +1,4 @@
> +async function RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}
> diff --git a/t/t4018/javascript-export-async-function b/t/t4018/javascript-export-async-function
> new file mode 100644
> index 0000000000..fecbd669d7
> --- /dev/null
> +++ b/t/t4018/javascript-export-async-function
> @@ -0,0 +1,4 @@
> +export async function RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}
> diff --git a/t/t4018/javascript-export-function b/t/t4018/javascript-export-function
> new file mode 100644
> index 0000000000..b5acbb2b08
> --- /dev/null
> +++ b/t/t4018/javascript-export-function
> @@ -0,0 +1,4 @@
> +export function RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}

These look good; the 'export' statements are part of the ES modules
feature. I'm not sure if it makes sense to explicitly test these cases
unless we have reason to believe that the 'export' keyword will affect
the matching.

> diff --git a/t/t4018/javascript-exports-anomyous-function b/t/t4018/javascript-exports-anomyous-function
> new file mode 100644
> index 0000000000..6786cbda8d
> --- /dev/null
> +++ b/t/t4018/javascript-exports-anomyous-function
> @@ -0,0 +1,4 @@
> +exports.setFlagged = RIGHT => {
> +	
> +    return ChangeMe;
> +};
> diff --git a/t/t4018/javascript-exports-anomyous-function-2 b/t/t4018/javascript-exports-anomyous-function-2
> new file mode 100644
> index 0000000000..883569f40d
> --- /dev/null
> +++ b/t/t4018/javascript-exports-anomyous-function-2
> @@ -0,0 +1,4 @@
> +exports.RIGHT = (a, b, runtime) => {
> +	
> +    return ChangeMe;
> +};
> diff --git a/t/t4018/javascript-exports-function b/t/t4018/javascript-exports-function
> new file mode 100644
> index 0000000000..63b79f5991
> --- /dev/null
> +++ b/t/t4018/javascript-exports-function
> @@ -0,0 +1,4 @@
> +exports.RIGHT = function(document) {
> +    
> +    return ChangeMe;
> +}

I don't think we should include 'exports.foo = bar'. To my knowledge,
this is _not_ a standard ES feature, but rather the CommonJS module
system popularized by Node.js [1] and other frameworks. To confirm this,
I searched the ES spec and did not find any reference to exports.* [2].

Even if we wanted to support nonstandard 'language features' (and this
label is tenuous at best, CommonJS is not trying to replace the ES
modules standard AFAIK), Node.js is also starting to support ES modules,
so I don't think this is a good long term direction for Git.

[1] https://nodejs.org/api/modules.html
[2] https://262.ecma-international.org/12.0/#sec-exports

> diff --git a/t/t4018/javascript-function b/t/t4018/javascript-function
> new file mode 100644
> index 0000000000..0cc0bf54e7
> --- /dev/null
> +++ b/t/t4018/javascript-function
> @@ -0,0 +1,4 @@
> +function RIGHT(a, b) {
> +
> +  return a + b; // ChangeMe
> +}
> diff --git a/t/t4018/javascript-function-2 b/t/t4018/javascript-function-2
> new file mode 100644
> index 0000000000..06cfb779f0
> --- /dev/null
> +++ b/t/t4018/javascript-function-2
> @@ -0,0 +1,10 @@
> +function test(a, b) {
> +  return {
> +			RIGHT: function () {
> +				currentUpdateRemovedChunks.forEach(function (chunkId) {
> +					delete $installedChunks$[chunkId];
> +				});
> +				currentUpdateRemovedChunks = ChangeMe;
> +   }
> +  }
> +}

There is also the ES2015 'method shorthand' syntax [3], e.g. `bar` in:

  const foo = {
    bar() {
      console.log('hi');
    }
  }

[3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Method_definitions

> diff --git a/t/t4018/javascript-function-belong-to-IIFE b/t/t4018/javascript-function-belong-to-IIFE
> new file mode 100644
> index 0000000000..6e5fe858c0
> --- /dev/null
> +++ b/t/t4018/javascript-function-belong-to-IIFE
> @@ -0,0 +1,6 @@
> +(function () {
> +  this.$RIGHT = function (needle, modifier) {
> +      let a = 5;
> +      return ChangeMe;
> +  };
> +}).call(aaaa.prototype);

 Does the IIFE matter in this case? This line:

  this.$RIGHT = function (needle, modifier) {

looks extremely similar to the previous test of `foo = function bar()`.

Or perhaps this is meant to demonstrate the edge case of "matching in a
complicated construct"? If so, perhaps we should test other edge cases
like:

   function WRONG() {
     let RIGHT = function (needle, modifier) {
         let a = 5;
         return ChangeMe;
     };
   }

> diff --git a/t/t4018/javascript-function-in-class b/t/t4018/javascript-function-in-class
> new file mode 100644
> index 0000000000..0cc0a26612
> --- /dev/null
> +++ b/t/t4018/javascript-function-in-class
> @@ -0,0 +1,6 @@
> +class Test {
> +  RIGHT() {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> diff --git a/t/t4018/javascript-function-in-class-2 b/t/t4018/javascript-function-in-class-2
> new file mode 100644
> index 0000000000..725495fe55
> --- /dev/null
> +++ b/t/t4018/javascript-function-in-class-2
> @@ -0,0 +1,11 @@
> +class Test {
> +  RIGHT(
> +      aaaaaaaaaa,
> +      bbbbbbbbbb,
> +      cccccccccc,
> +      dddddddddd
> +  ) {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> diff --git a/t/t4018/javascript-function-in-class-3 b/t/t4018/javascript-function-in-class-3
> new file mode 100644
> index 0000000000..e9b20728b2
> --- /dev/null
> +++ b/t/t4018/javascript-function-in-class-3
> @@ -0,0 +1,10 @@
> +class Test {
> +  RIGHT(aaaaaaaaaa,
> +      bbbbbbbbbb,
> +      cccccccccc,
> +      dddddddddd
> +  ) {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> diff --git a/t/t4018/javascript-function-in-object-literal b/t/t4018/javascript-function-in-object-literal
> new file mode 100644
> index 0000000000..021cc706dd
> --- /dev/null
> +++ b/t/t4018/javascript-function-in-object-literal
> @@ -0,0 +1,7 @@
> +const obj = {
> +    RIGHT: function (elems, callback, arg) {
> +        var length, value;
> +        // ...
> +        return ChangeMe
> +    }
> +}
> diff --git a/t/t4018/javascript-generator-function b/t/t4018/javascript-generator-function
> new file mode 100644
> index 0000000000..dc7793939f
> --- /dev/null
> +++ b/t/t4018/javascript-generator-function
> @@ -0,0 +1,4 @@
> +function* RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}
> diff --git a/t/t4018/javascript-generator-function-2 b/t/t4018/javascript-generator-function-2
> new file mode 100644
> index 0000000000..950676a612
> --- /dev/null
> +++ b/t/t4018/javascript-generator-function-2
> @@ -0,0 +1,4 @@
> +function *RIGHT(a, b) {
> +  
> +  return a + b; // ChangeMe
> +}
> diff --git a/t/t4018/javascript-getter-function-in-class b/t/t4018/javascript-getter-function-in-class
> new file mode 100644
> index 0000000000..9a5aee39f7
> --- /dev/null
> +++ b/t/t4018/javascript-getter-function-in-class
> @@ -0,0 +1,6 @@
> +class Test {
> +  get RIGHT() {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> diff --git a/t/t4018/javascript-setter-function-in-class b/t/t4018/javascript-setter-function-in-class
> new file mode 100644
> index 0000000000..dc5f288665
> --- /dev/null
> +++ b/t/t4018/javascript-setter-function-in-class
> @@ -0,0 +1,6 @@
> +class Test {
> +  set RIGHT() {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}
> diff --git a/t/t4018/javascript-skip-function-call-statement b/t/t4018/javascript-skip-function-call-statement
> new file mode 100644
> index 0000000000..321993c27e
> --- /dev/null
> +++ b/t/t4018/javascript-skip-function-call-statement
> @@ -0,0 +1,7 @@
> +class Test {
> +  static RIGHT() {
> +    haha();
> +    haha2()
> +    let b = ChangeMe;
> +  }
> +}
> diff --git a/t/t4018/javascript-skip-keywords b/t/t4018/javascript-skip-keywords
> new file mode 100644
> index 0000000000..5584970b58
> --- /dev/null
> +++ b/t/t4018/javascript-skip-keywords
> @@ -0,0 +1,34 @@
> +function RIGHT(a, b) {
> +  import("./async1")
> +  if (a > 1) {
> +    // ...
> +  }
> +  do {
> +    // ...
> +  } while (i < 5);
> +  for (const element of array1) {
> +    console.log(element)
> +  }
> +  with(o) {
> +    console.log(x)
> +  }
> +  switch (expr) {
> +    case 'a':
> +      // ...
> +      break;
> +    case 'b':
> +      // ...
> +      break;
> +    default:
> +      // ...
> +  }
> +  try {
> +    // ...
> +    return (a + c)
> +  } 
> +  catch (error) {
> +    // ...
> +  }
> +
> +  return a + b; // ChangeMe
> +}
> diff --git a/t/t4018/javascript-static-function-in-class b/t/t4018/javascript-static-function-in-class
> new file mode 100644
> index 0000000000..fbf0b7ca3d
> --- /dev/null
> +++ b/t/t4018/javascript-static-function-in-class
> @@ -0,0 +1,6 @@
> +class Test {
> +  static RIGHT() {
> +    let a = 4;
> +    let b = ChangeMe;
> +  }
> +}

The rest of the test cases look good.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-14 17:20   ` Glen Choo
@ 2022-03-15  7:40     ` Johannes Sixt
  2022-03-15 18:51       ` Glen Choo
  2022-04-03 13:21     ` xing zhi jiang
  1 sibling, 1 reply; 25+ messages in thread
From: Johannes Sixt @ 2022-03-15  7:40 UTC (permalink / raw)
  To: Glen Choo; +Cc: xing zhi jiang, git

Am 14.03.22 um 18:20 schrieb Glen Choo:
> xing zhi jiang <a97410985new@gmail.com> writes:
>> diff --git a/t/t4018/javascript-exports-function b/t/t4018/javascript-exports-function
>> new file mode 100644
>> index 0000000000..63b79f5991
>> --- /dev/null
>> +++ b/t/t4018/javascript-exports-function
>> @@ -0,0 +1,4 @@
>> +exports.RIGHT = function(document) {
>> +    
>> +    return ChangeMe;
>> +}
> 
> I don't think we should include 'exports.foo = bar'. To my knowledge,
> this is _not_ a standard ES feature, but rather the CommonJS module
> system popularized by Node.js [1] and other frameworks. To confirm this,
> I searched the ES spec and did not find any reference to exports.* [2].
> 
> Even if we wanted to support nonstandard 'language features' (and this
> label is tenuous at best, CommonJS is not trying to replace the ES
> modules standard AFAIK), Node.js is also starting to support ES modules,
> so I don't think this is a good long term direction for Git.

It is not a priority to model hunk header regular expressions after some
standard and to ignore stuff that is outside the standard. The goal is
to make them useful in a majority of cases. If there exists a noticable
chunk of code that uses non-standard constructs, then that is worth
being supported.

> 
> [1] https://nodejs.org/api/modules.html
> [2] https://262.ecma-international.org/12.0/#sec-exports

-- Hannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-15  7:40     ` Johannes Sixt
@ 2022-03-15 18:51       ` Glen Choo
  2022-03-15 19:22         ` Junio C Hamano
  2022-04-03 13:20         ` xing zhi jiang
  0 siblings, 2 replies; 25+ messages in thread
From: Glen Choo @ 2022-03-15 18:51 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: xing zhi jiang, git

Johannes Sixt <j6t@kdbg.org> writes:

> Am 14.03.22 um 18:20 schrieb Glen Choo:
>> xing zhi jiang <a97410985new@gmail.com> writes:
>>> diff --git a/t/t4018/javascript-exports-function b/t/t4018/javascript-exports-function
>>> new file mode 100644
>>> index 0000000000..63b79f5991
>>> --- /dev/null
>>> +++ b/t/t4018/javascript-exports-function
>>> @@ -0,0 +1,4 @@
>>> +exports.RIGHT = function(document) {
>>> +    
>>> +    return ChangeMe; >>> +}
>> 
>> I don't think we should include 'exports.foo = bar'. To my knowledge,
>> this is _not_ a standard ES feature, but rather the CommonJS module
>> system popularized by Node.js [1] and other frameworks. To confirm this,
>> I searched the ES spec and did not find any reference to exports.* [2].
>> 
>> Even if we wanted to support nonstandard 'language features' (and this
>> label is tenuous at best, CommonJS is not trying to replace the ES
>> modules standard AFAIK), Node.js is also starting to support ES modules,
>> so I don't think this is a good long term direction for Git.
>
> It is not a priority to model hunk header regular expressions after some
> standard and to ignore stuff that is outside the standard. The goal is
> to make them useful in a majority of cases. If there exists a noticable
> chunk of code that uses non-standard constructs, then that is worth
> being supported.

Interesting, I'll take note. I'm still personally not keen on supporting
CommonJS-only patterns when we are purportedly trying to show diffs for
JavaScript, but if we think this fits the style, I'm happy to oblige.

So the question becomes "Is there a significant amount of code that uses
this pattern?" Probably - this is a fairly common pattern in Node.js
after all. But in my experience,

 module.exports.RIGHT = function(document) {
     
     return ChangeMe;
 }

is even more common. The difference between 'module.exports' and
'exports' isn't worth going into (StackOverflow has all the answers, for
the curious), but if we're taking the approach of supporting CommonJS,
I'd like to be consistent and also support 'module.exports', i.e.
perhaps change:

  "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"

to something like:

  "^((module.)?exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-15 18:51       ` Glen Choo
@ 2022-03-15 19:22         ` Junio C Hamano
  2022-03-15 21:34           ` Glen Choo
  2022-04-03 13:20         ` xing zhi jiang
  1 sibling, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2022-03-15 19:22 UTC (permalink / raw)
  To: Glen Choo; +Cc: Johannes Sixt, xing zhi jiang, git

Glen Choo <chooglen@google.com> writes:

> Interesting, I'll take note. I'm still personally not keen on supporting
> CommonJS-only patterns when we are purportedly trying to show diffs for
> JavaScript, but if we think this fits the style, I'm happy to oblige.

The question is, with these patterns that are aware of CommonJS
convention, would your bog-standard-and-boring vanilla JS code be
detected incorrectly?  Becoming aware of popular conventions without
hurting others would be a good thing.

And the "popular conventions" does not have to be limited to
CommonJS/Node.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-15 19:22         ` Junio C Hamano
@ 2022-03-15 21:34           ` Glen Choo
  2022-04-03 13:24             ` xing zhi jiang
  0 siblings, 1 reply; 25+ messages in thread
From: Glen Choo @ 2022-03-15 21:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, xing zhi jiang, git

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Interesting, I'll take note. I'm still personally not keen on supporting
>> CommonJS-only patterns when we are purportedly trying to show diffs for
>> JavaScript, but if we think this fits the style, I'm happy to oblige.
>
> The question is, with these patterns that are aware of CommonJS
> convention, would your bog-standard-and-boring vanilla JS code be
> detected incorrectly?  Becoming aware of popular conventions without
> hurting others would be a good thing.
>
> And the "popular conventions" does not have to be limited to
> CommonJS/Node.

From the perspective of "'exports' is a special name", yes, we could
detect vanilla JS code 'incorrectly' because, in vanilla JS, the names
'exports' or 'module.exports' are not special. So perhaps, one could
imagine a browser-side script that deals with "imports" and "exports" as
part of their business:

  const exports = {
    quantity: 1,
    type: 'boxes',
  };
  exports.getQuantity = () => {
    foo();
  };

This diff driver would mistakenly detect `exports.getQuantity = () =>
{`.

Although, the more I think about it, the spirit of this patch seems to
be "we want to show headers whenever we think we are in a function", so
we don't actually need to treat 'exports' or 'module.exports' specially
at all, e.g. this case should also pass our diff driver tests:

  const foo = {};
  foo.RIGHT = () => {

    ChangeMe();
  };

and if we do this, we will correctly handle 'exports' and
'module.exports' anyway by virtue of them being plain old JS objects.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-13 21:54   ` Johannes Sixt
@ 2022-04-03 13:17     ` xing zhi jiang
  0 siblings, 0 replies; 25+ messages in thread
From: xing zhi jiang @ 2022-04-03 13:17 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Sorry for the late reply. I've been busy the last two weeks.

On Mon, 14 Mar 2022 at 05:54, Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 12.03.22 um 17:48 schrieb xing zhi jiang:
> > diff --git a/userdiff.c b/userdiff.c
> > index 8578cb0d12..51bfe4021d 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -168,6 +168,38 @@ PATTERNS("java",
> >        "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
> >        "|[-+*/<>%&^|=!]="
> >        "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> > +
> > +PATTERNS("javascript",
> > +      /* don't match the expression may contain parenthesis, because it is not a function declaration */
> > +      "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
> > +      /* don't match statement */
> > +      "!;\n"
> > +      /* match normal function */
> > +      "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
> > +      /* match JavaScript variable declaration with a lambda expression */
> > +      "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> > +      "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
>
> It would help readability if this second line of this regex were
> indented because it is a continuation of the first line.
>
This will be fixed in the v3 patch.

> > +      /* match exports for anonymous fucntion */
> > +      "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
> > +      /* match assign function to LHS */
> > +      "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"
>
> This should be written as
>
>          "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*)?\\(.*)\n"
>
> Notice that the whitespace after the identifier can only appear when
> there is actually an identifier. The point is to reduce the different
> matches permitted by the sub-expression "[\t ]*[\t ]*" when there is no
> identifier in the text.
>
> Can the keyword function ever be followed by a number? I guess not. Then
> [$_[:alpha:]][$_[:alnum:]]* could be reduced to [$_[:alnum:]]+
This will be fixed in the v3 patch.

> > +      /* match normal function in object literal */
> > +      "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n"
> > +      /* don't match the function in class, which has more than one ident level */
> > +      "!^(\t{2,}|[ ]{5,})\n"
> > +      /* match function in class */
> > +      "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",> +    /* word regex */
> > +      /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, DecimalLiteral and its big version */
> > +      "(0[xXoObB])?[0-9a-fA-F][_0-9a-fA-F]*n?"
> > +      /* DecimalLiteral may be float */
> > +      "|(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
>
> Having alternatives that begin with an optional part make the regex
> evaluation comparatively inefficient. In particular, both alternatives
> above match a decimal integer. I suggest to have the first alternative
> only for hex, octal, and binary integers, and the second for all decimal
> numbers including floatingpoint:
>
>          /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and
> their big versions */
>          "0[xXoObB][_0-9a-fA-F]+n?"
>          /* DecimalLiteral may be float */
>          "|[0-9][_0-9]*(\\.[_0-9]*|n)?([eE][+-]?[_0-9]+)?"
>
> and if floating point literals can begin with a decimal point, then we
> also need
>
>          "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
>
I agree on separate decimal regex from others. And in JavaScript
floating numbers can start with a dot. So need add new regex -
"|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?".
This will be fixed in the v3 patch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-15 18:51       ` Glen Choo
  2022-03-15 19:22         ` Junio C Hamano
@ 2022-04-03 13:20         ` xing zhi jiang
  1 sibling, 0 replies; 25+ messages in thread
From: xing zhi jiang @ 2022-04-03 13:20 UTC (permalink / raw)
  To: Glen Choo; +Cc: git

On Wed, 16 Mar 2022 at 02:51, Glen Choo <chooglen@google.com> wrote:
>
> Johannes Sixt <j6t@kdbg.org> writes:
> So the question becomes "Is there a significant amount of code that uses
> this pattern?" Probably - this is a fairly common pattern in Node.js
> after all. But in my experience,
>
>  module.exports.RIGHT = function(document) {
>
>      return ChangeMe;
>  }
>
> is even more common. The difference between 'module.exports' and
> 'exports' isn't worth going into (StackOverflow has all the answers, for
> the curious), but if we're taking the approach of supporting CommonJS,
> I'd like to be consistent and also support 'module.exports', i.e.
> perhaps change:
>
>   "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
>
> to something like:
>
>   "^((module.)?exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
I agree on also support the "module.exports". This will be applied on
the v3 patch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-14 17:20   ` Glen Choo
  2022-03-15  7:40     ` Johannes Sixt
@ 2022-04-03 13:21     ` xing zhi jiang
  1 sibling, 0 replies; 25+ messages in thread
From: xing zhi jiang @ 2022-04-03 13:21 UTC (permalink / raw)
  To: Glen Choo; +Cc: git

Sorry for the late reply. Because I was busy last two weeks. And
grateful for your review.

On Tue, 15 Mar 2022 at 01:20, Glen Choo <chooglen@google.com> wrote:
>
> > diff --git a/t/t4018/javascript-function b/t/t4018/javascript-function
> > new file mode 100644
> > index 0000000000..0cc0bf54e7
> > --- /dev/null
> > +++ b/t/t4018/javascript-function
> > @@ -0,0 +1,4 @@
> > +function RIGHT(a, b) {
> > +
> > +  return a + b; // ChangeMe
> > +}
> > diff --git a/t/t4018/javascript-function-2 b/t/t4018/javascript-function-2
> > new file mode 100644
> > index 0000000000..06cfb779f0
> > --- /dev/null
> > +++ b/t/t4018/javascript-function-2
> > @@ -0,0 +1,10 @@
> > +function test(a, b) {
> > +  return {
> > +                     RIGHT: function () {
> > +                             currentUpdateRemovedChunks.forEach(function (chunkId) {
> > +                                     delete $installedChunks$[chunkId];
> > +                             });
> > +                             currentUpdateRemovedChunks = ChangeMe;
> > +   }
> > +  }
> > +}
>
> There is also the ES2015 'method shorthand' syntax [3], e.g. `bar` in:
>
>   const foo = {
>     bar() {
>       console.log('hi');
>     }
>   }
>
> [3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Method_definitions
>
The ES2015 method shorthand is already matched by the regex for
function in class
"^[\t ]*((static[\t ]+)?((async|get|set)[\t
]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)"
So I update the comment for this regex in the v3 patch. The comment is
"/* match function in class and ES5 method shorthand */"

> > diff --git a/t/t4018/javascript-function-belong-to-IIFE b/t/t4018/javascript-function-belong-to-IIFE
> > new file mode 100644
> > index 0000000000..6e5fe858c0
> > --- /dev/null
> > +++ b/t/t4018/javascript-function-belong-to-IIFE
> > @@ -0,0 +1,6 @@
> > +(function () {
> > +  this.$RIGHT = function (needle, modifier) {
> > +      let a = 5;
> > +      return ChangeMe;
> > +  };
> > +}).call(aaaa.prototype);
>
>  Does the IIFE matter in this case? This line:
>
>   this.$RIGHT = function (needle, modifier) {
>
> looks extremely similar to the previous test of `foo = function bar()`.
>
> Or perhaps this is meant to demonstrate the edge case of "matching in a
> complicated construct"? If so, perhaps we should test other edge cases
> like:
>
>    function WRONG() {
>      let RIGHT = function (needle, modifier) {
>          let a = 5;
>          return ChangeMe;
>      };
>    }
>
Currently, I realize this test case is redundant. Because I trust the
function keyword. So the regex "^(.*=[\t ]*function[\t
]*([$_[:alnum:]]+[\t ]*)?\\(.*)\n" is loose for LHS.
I will remove this case in the v3 patch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.
  2022-03-15 21:34           ` Glen Choo
@ 2022-04-03 13:24             ` xing zhi jiang
  0 siblings, 0 replies; 25+ messages in thread
From: xing zhi jiang @ 2022-04-03 13:24 UTC (permalink / raw)
  To: Glen Choo; +Cc: git

On Wed, 16 Mar 2022 at 05:34, Glen Choo <chooglen@google.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Glen Choo <chooglen@google.com> writes:
> >
> >> Interesting, I'll take note. I'm still personally not keen on supporting
> >> CommonJS-only patterns when we are purportedly trying to show diffs for
> >> JavaScript, but if we think this fits the style, I'm happy to oblige.
> >
> > The question is, with these patterns that are aware of CommonJS
> > convention, would your bog-standard-and-boring vanilla JS code be
> > detected incorrectly?  Becoming aware of popular conventions without
> > hurting others would be a good thing.
> >
> > And the "popular conventions" does not have to be limited to
> > CommonJS/Node.
>
> From the perspective of "'exports' is a special name", yes, we could
> detect vanilla JS code 'incorrectly' because, in vanilla JS, the names
> 'exports' or 'module.exports' are not special. So perhaps, one could
> imagine a browser-side script that deals with "imports" and "exports" as
> part of their business:
>
>   const exports = {
>     quantity: 1,
>     type: 'boxes',
>   };
>   exports.getQuantity = () => {
>     foo();
>   };
>
> This diff driver would mistakenly detect `exports.getQuantity = () =>
> {`.
>
> Although, the more I think about it, the spirit of this patch seems to
> be "we want to show headers whenever we think we are in a function", so
> we don't actually need to treat 'exports' or 'module.exports' specially
> at all, e.g. this case should also pass our diff driver tests:
>
>   const foo = {};
>   foo.RIGHT = () => {
>
>     ChangeMe();
>   };
>
> and if we do this, we will correctly handle 'exports' and
> 'module.exports' anyway by virtue of them being plain old JS objects.
The spirit of this patch is to show headers when we are in the
function body(which has a large code block). Because this can help
users understand the context.
And prevent mismatch non-related function. ex:
    function WRONG() {
        // ...
    }
    const foo = {};
    foo.RIGHT = () => {

      ChangeMe();
    };
if we don't match "foo.RIGHT = () => {". It may match the "function
WRONG() {". This would be very misleading.
So the v3 patch. I do not treat `exports` as a special keyword. And
can match the code like "foo.RIGHT = () => {".
The regex would be
"((module\\.)?[$_[:alpha:]][$_[:alnum:]]*\\.[$_[:alpha:]][$_[:alnum:]]*[\t
]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)"

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [GSoC][PATCH v3] Add a diff driver for JavaScript languages.
  2022-03-04 13:08 [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language xing zhi jiang
                   ` (2 preceding siblings ...)
  2022-03-12 16:48 ` [GSoC][PATCH v2] Add a diff driver for JavaScript languages xing zhi jiang
@ 2022-04-03 13:25 ` xing zhi jiang
  2022-04-03 14:40   ` Johannes Sixt
                     ` (2 more replies)
  3 siblings, 3 replies; 25+ messages in thread
From: xing zhi jiang @ 2022-04-03 13:25 UTC (permalink / raw)
  To: git; +Cc: j6t, chooglen

In the xfunction part that matches normal functions,
a variable declaration with an assignment of function, the function declaration
in the class, named export for function in ECMA2015, CommonJS exports for
named function, and also the function is object literal's property[1].
The special match is this would match the popular framework test case
pattern(most is same as regex for matching function in class).

And in the word regex part, that matches numbers, punctuations, and also the
JavaScript identifier.
This part reference the formal ECMA specification[2].

[1]https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
[2]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar

Signed-off-by: xing zhi jiang <a97410985new@gmail.com>
---
I write a simple python script to help me observe the diff's results easier. 
And the result is in https://github.com/a97410985/user_diff_observe_tool.
I observe the six popular JavaScript projects - Jquery, NodeJS, React, TensorflowJS, 
Vue, and Webpack. The result is not perfect but acceptable.
Most of the mismatch is related to the function in class. Ex:
https://github.com/a97410985/user_diff_observe_tool/blob/092bd14c849b9a3f61bb872d730e38b58cc9c303/diff_result/jquery_result.diff#L1094.
But this is inevitable.
And besides, apply the code reviews suggestions. I add some new regex.
1. "!^[\t ]*[$_[:alpha:]][$_[:alnum:]][\t ]*\\(.*\\)\\.\n"
   this is for the case below
   `
    function RIGHT(a, b) {

        WRONG(spy.calls.count()).toBe(1)
        return a + b; // ChangeMe
    }
   `
   if we don't skip chained function call. This would happen mismatch.
2. "^[\t ]*(QUnit.test\\(.*)\n"
   I oberve there are many test case pattern in JavaScript. And most of popular JavaScript
   unit test framework's test case pattern is function call can be matched by the regex, which
   match "function in class". So it is no problem. But `QUnit framework` don't matched by that.
   Then add this regex into JavaScript userdiff.
   And the reason includes test case pattern has two
   1. This is inevitable. Because we can't distinguish that with function in class.
   2. This can help the user understand what change is happening in which test case. And if 
      don't match the test case, it has the large possibility to match no related function.
      
And I move the regex, which match normal function in object literal, under the regex 
""!^(\t{2,}|[ ]{5,})\n".
Because I found it would many mismatch if function in object literal has too many indent level(
the function body usually just few lines of code, and the below has large of code. If has change
on below, it would happen mismatch).
And there are two example for hunk header that match test case pattern.
      
Range-diff against v2:
1:  3b326bd2b6 ! 1:  2791a7b6b8 Add a diff driver for JavaScript languages.
    @@ Commit message
     
         In the xfunction part that matches normal functions,
         a variable declaration with an assignment of function, the function declaration
    -    in the class, and also the function is object literal's property[1].
    +    in the class, named export for function in ECMA2015, CommonJS exports for
    +    named function, and also the function is object literal's property[1].
    +    The special match is this would match the popular framework test case
    +    pattern(most is same as regex for matching function in class).
     
         And in the word regex part, that matches numbers, punctuations, and also the
         JavaScript identifier.
    @@ t/t4018/javascript-async-function (new)
     +async function RIGHT(a, b) {
     +  
     +  return a + b; // ChangeMe
    ++}
    +
    + ## t/t4018/javascript-es5-method-shorthand (new) ##
    +@@
    ++const foo = {
    ++    RIGHT() {
    ++        
    ++        console.log('ChangeMe');
    ++    }
     +}
     
      ## t/t4018/javascript-export-async-function (new) ##
    @@ t/t4018/javascript-exports-anomyous-function-2 (new)
     +exports.RIGHT = (a, b, runtime) => {
     +	
     +    return ChangeMe;
    ++};
    +
    + ## t/t4018/javascript-exports-anomyous-function-3 (new) ##
    +@@
    ++module.exports.RIGHT = (a, b, runtime) => {
    ++	
    ++    return ChangeMe;
    ++};
    +
    + ## t/t4018/javascript-exports-anomyous-function-4 (new) ##
    +@@
    ++module.exports = ( RIGHT ) => {
    ++	
    ++    return ChangeMe;
     +};
     
      ## t/t4018/javascript-exports-function (new) ##
    @@ t/t4018/javascript-function (new)
     +  return a + b; // ChangeMe
     +}
     
    - ## t/t4018/javascript-function-2 (new) ##
    -@@
    -+function test(a, b) {
    -+  return {
    -+			RIGHT: function () {
    -+				currentUpdateRemovedChunks.forEach(function (chunkId) {
    -+					delete $installedChunks$[chunkId];
    -+				});
    -+				currentUpdateRemovedChunks = ChangeMe;
    -+   }
    -+  }
    -+}
    -
    - ## t/t4018/javascript-function-belong-to-IIFE (new) ##
    -@@
    -+(function () {
    -+  this.$RIGHT = function (needle, modifier) {
    -+      let a = 5;
    -+      return ChangeMe;
    -+  };
    -+}).call(aaaa.prototype);
    -
      ## t/t4018/javascript-function-in-class (new) ##
     @@
     +class Test {
    @@ t/t4018/javascript-setter-function-in-class (new)
     +    let a = 4;
     +    let b = ChangeMe;
     +  }
    ++}
    +
    + ## t/t4018/javascript-skip-chained-function (new) ##
    +@@
    ++function RIGHT(a, b) {
    ++
    ++  expect(spy.calls.count()).toBe(1)
    ++  return a + b; // ChangeMe
     +}
     
      ## t/t4018/javascript-skip-function-call-statement (new) ##
    @@ userdiff.c: PATTERNS("java",
     +	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
     +	 /* don't match statement */
     +	 "!;\n"
    -+	 /* match normal function */
    ++	 /* match normal function or named export for function in ECMA2015 */
     +	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
    -+	 /* match JavaScript variable declaration with a lambda expression */
    -+	 "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
    -+	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
    -+	 /* match exports for anonymous fucntion */
    -+	 "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
    -+	 /* match assign function to LHS */
    -+	 "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"
    -+	 /* match normal function in object literal */
    -+	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n"
    -+	 /* don't match the function in class, which has more than one ident level */
    ++	 /* match JavaScript variable declaration with a lambda expression at top level */
    ++	 "^((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
    ++		"(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
    ++	 /* match object's property assignment by anonymous function and CommonJS exports for named function */
    ++	 "^((module\\.)?[$_[:alpha:]][$_[:alnum:]]*\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(async[\t ]+)?(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
    ++	 /* match assign function to LHS with explicit function keyword */
    ++	 "^(.*=[\t ]*function[\t ]*([$_[:alnum:]]+[\t ]*)?\\(.*)\n"
    ++	 /* popular unit testing framework test case pattern. Most of framework pattern is match by regex for "function in class" */
    ++	 "^[\t ]*(QUnit.test\\(.*)\n"
    ++	 /* don't match the function in class or in object literal, which has more than one ident level */
     +	 "!^(\t{2,}|[ ]{5,})\n"
    -+	 /* match function in class */
    ++	 /* match normal function in object literal */
    ++	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function.*)\n"
    ++	 /* don't match chained method call */
    ++	 "!^[\t ]*[$_[:alpha:]][$_[:alnum:]][\t ]*\\(.*\\)\\.\n"
    ++	 /* match function in class and ES5 method shorthand */
     +	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",
     +	 /* word regex */
    -+	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, DecimalLiteral and its big version */
    -+	 "(0[xXoObB])?[0-9a-fA-F][_0-9a-fA-F]*n?"
    -+	 /* DecimalLiteral may be float */
    -+	 "|(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
    ++	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and its big version */
    ++	 "0[xXoObB][_0-9a-fA-F]+n?"
    ++	 /* DecimalLiteral and its big version*/
    ++	 "|[0-9][_0-9]*(\\.[0-9][_0-9]*|n)?([eE][+-]?[_0-9]+)?"
    ++	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
     +	 /* punctuations */
     +	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
     +	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="

 Documentation/gitattributes.txt               |  2 +
 ...avascript-assignment-of-anonymous-function |  4 ++
 .../javascript-assignment-of-arrow-function   |  4 ++
 .../javascript-assignment-of-arrow-function-2 |  4 ++
 .../javascript-assignment-of-arrow-function-3 |  4 ++
 .../javascript-assignment-of-named-function   |  4 ++
 t/t4018/javascript-async-function             |  4 ++
 t/t4018/javascript-es5-method-shorthand       |  6 +++
 t/t4018/javascript-export-async-function      |  4 ++
 t/t4018/javascript-export-function            |  4 ++
 t/t4018/javascript-exports-anomyous-function  |  4 ++
 .../javascript-exports-anomyous-function-2    |  4 ++
 .../javascript-exports-anomyous-function-3    |  4 ++
 .../javascript-exports-anomyous-function-4    |  4 ++
 t/t4018/javascript-exports-function           |  4 ++
 t/t4018/javascript-function                   |  4 ++
 t/t4018/javascript-function-in-class          |  6 +++
 t/t4018/javascript-function-in-class-2        | 11 ++++
 t/t4018/javascript-function-in-class-3        | 10 ++++
 t/t4018/javascript-function-in-object-literal |  7 +++
 t/t4018/javascript-generator-function         |  4 ++
 t/t4018/javascript-generator-function-2       |  4 ++
 t/t4018/javascript-getter-function-in-class   |  6 +++
 t/t4018/javascript-setter-function-in-class   |  6 +++
 t/t4018/javascript-skip-chained-function      |  5 ++
 .../javascript-skip-function-call-statement   |  7 +++
 t/t4018/javascript-skip-keywords              | 34 ++++++++++++
 t/t4018/javascript-static-function-in-class   |  6 +++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/javascript/expect                     | 54 +++++++++++++++++++
 t/t4034/javascript/post                       | 33 ++++++++++++
 t/t4034/javascript/pre                        | 33 ++++++++++++
 userdiff.c                                    | 37 +++++++++++++
 33 files changed, 328 insertions(+)
 create mode 100644 t/t4018/javascript-assignment-of-anonymous-function
 create mode 100644 t/t4018/javascript-assignment-of-arrow-function
 create mode 100644 t/t4018/javascript-assignment-of-arrow-function-2
 create mode 100644 t/t4018/javascript-assignment-of-arrow-function-3
 create mode 100644 t/t4018/javascript-assignment-of-named-function
 create mode 100644 t/t4018/javascript-async-function
 create mode 100644 t/t4018/javascript-es5-method-shorthand
 create mode 100644 t/t4018/javascript-export-async-function
 create mode 100644 t/t4018/javascript-export-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function-2
 create mode 100644 t/t4018/javascript-exports-anomyous-function-3
 create mode 100644 t/t4018/javascript-exports-anomyous-function-4
 create mode 100644 t/t4018/javascript-exports-function
 create mode 100644 t/t4018/javascript-function
 create mode 100644 t/t4018/javascript-function-in-class
 create mode 100644 t/t4018/javascript-function-in-class-2
 create mode 100644 t/t4018/javascript-function-in-class-3
 create mode 100644 t/t4018/javascript-function-in-object-literal
 create mode 100644 t/t4018/javascript-generator-function
 create mode 100644 t/t4018/javascript-generator-function-2
 create mode 100644 t/t4018/javascript-getter-function-in-class
 create mode 100644 t/t4018/javascript-setter-function-in-class
 create mode 100644 t/t4018/javascript-skip-chained-function
 create mode 100644 t/t4018/javascript-skip-function-call-statement
 create mode 100644 t/t4018/javascript-skip-keywords
 create mode 100644 t/t4018/javascript-static-function-in-class
 create mode 100644 t/t4034/javascript/expect
 create mode 100644 t/t4034/javascript/post
 create mode 100644 t/t4034/javascript/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 60984a4682..a8e3e4d735 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -828,6 +828,8 @@ patterns are available:
 
 - `java` suitable for source code in the Java language.
 
+- `javascript` suitable for source code in the JavaScript language.
+
 - `markdown` suitable for Markdown documents.
 
 - `matlab` suitable for source code in the MATLAB and Octave languages.
diff --git a/t/t4018/javascript-assignment-of-anonymous-function b/t/t4018/javascript-assignment-of-anonymous-function
new file mode 100644
index 0000000000..b6f2ccccfc
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-anonymous-function
@@ -0,0 +1,4 @@
+const RIGHT = function (a, b) {
+	
+    return a + b; // ChangeMe
+};
diff --git a/t/t4018/javascript-assignment-of-arrow-function b/t/t4018/javascript-assignment-of-arrow-function
new file mode 100644
index 0000000000..24ce517b7a
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-arrow-function
@@ -0,0 +1,4 @@
+const RIGHT = (a, b) => {
+	
+    return a + b; // ChangeMe
+};
diff --git a/t/t4018/javascript-assignment-of-arrow-function-2 b/t/t4018/javascript-assignment-of-arrow-function-2
new file mode 100644
index 0000000000..bbf5de369e
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-arrow-function-2
@@ -0,0 +1,4 @@
+const RIGHT = (a, b)=>{
+	
+    return a + b; // ChangeMe
+};
diff --git a/t/t4018/javascript-assignment-of-arrow-function-3 b/t/t4018/javascript-assignment-of-arrow-function-3
new file mode 100644
index 0000000000..4a07aa3259
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-arrow-function-3
@@ -0,0 +1,4 @@
+const RIGHT=test=>{
+	
+    return test + 1; // ChangeMe
+};
diff --git a/t/t4018/javascript-assignment-of-named-function b/t/t4018/javascript-assignment-of-named-function
new file mode 100644
index 0000000000..bfc486ebef
--- /dev/null
+++ b/t/t4018/javascript-assignment-of-named-function
@@ -0,0 +1,4 @@
+const RIGHT = function test (a, b) {
+	
+    return a + b; // ChangeMe
+};
diff --git a/t/t4018/javascript-async-function b/t/t4018/javascript-async-function
new file mode 100644
index 0000000000..993e6926bf
--- /dev/null
+++ b/t/t4018/javascript-async-function
@@ -0,0 +1,4 @@
+async function RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-es5-method-shorthand b/t/t4018/javascript-es5-method-shorthand
new file mode 100644
index 0000000000..01fc0cc34b
--- /dev/null
+++ b/t/t4018/javascript-es5-method-shorthand
@@ -0,0 +1,6 @@
+const foo = {
+    RIGHT() {
+        
+        console.log('ChangeMe');
+    }
+}
diff --git a/t/t4018/javascript-export-async-function b/t/t4018/javascript-export-async-function
new file mode 100644
index 0000000000..fecbd669d7
--- /dev/null
+++ b/t/t4018/javascript-export-async-function
@@ -0,0 +1,4 @@
+export async function RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-export-function b/t/t4018/javascript-export-function
new file mode 100644
index 0000000000..b5acbb2b08
--- /dev/null
+++ b/t/t4018/javascript-export-function
@@ -0,0 +1,4 @@
+export function RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-exports-anomyous-function b/t/t4018/javascript-exports-anomyous-function
new file mode 100644
index 0000000000..6786cbda8d
--- /dev/null
+++ b/t/t4018/javascript-exports-anomyous-function
@@ -0,0 +1,4 @@
+exports.setFlagged = RIGHT => {
+	
+    return ChangeMe;
+};
diff --git a/t/t4018/javascript-exports-anomyous-function-2 b/t/t4018/javascript-exports-anomyous-function-2
new file mode 100644
index 0000000000..883569f40d
--- /dev/null
+++ b/t/t4018/javascript-exports-anomyous-function-2
@@ -0,0 +1,4 @@
+exports.RIGHT = (a, b, runtime) => {
+	
+    return ChangeMe;
+};
diff --git a/t/t4018/javascript-exports-anomyous-function-3 b/t/t4018/javascript-exports-anomyous-function-3
new file mode 100644
index 0000000000..bc78ed87eb
--- /dev/null
+++ b/t/t4018/javascript-exports-anomyous-function-3
@@ -0,0 +1,4 @@
+module.exports.RIGHT = (a, b, runtime) => {
+	
+    return ChangeMe;
+};
diff --git a/t/t4018/javascript-exports-anomyous-function-4 b/t/t4018/javascript-exports-anomyous-function-4
new file mode 100644
index 0000000000..b952cf2a5a
--- /dev/null
+++ b/t/t4018/javascript-exports-anomyous-function-4
@@ -0,0 +1,4 @@
+module.exports = ( RIGHT ) => {
+	
+    return ChangeMe;
+};
diff --git a/t/t4018/javascript-exports-function b/t/t4018/javascript-exports-function
new file mode 100644
index 0000000000..63b79f5991
--- /dev/null
+++ b/t/t4018/javascript-exports-function
@@ -0,0 +1,4 @@
+exports.RIGHT = function(document) {
+    
+    return ChangeMe;
+}
diff --git a/t/t4018/javascript-function b/t/t4018/javascript-function
new file mode 100644
index 0000000000..0cc0bf54e7
--- /dev/null
+++ b/t/t4018/javascript-function
@@ -0,0 +1,4 @@
+function RIGHT(a, b) {
+
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-function-in-class b/t/t4018/javascript-function-in-class
new file mode 100644
index 0000000000..0cc0a26612
--- /dev/null
+++ b/t/t4018/javascript-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-function-in-class-2 b/t/t4018/javascript-function-in-class-2
new file mode 100644
index 0000000000..725495fe55
--- /dev/null
+++ b/t/t4018/javascript-function-in-class-2
@@ -0,0 +1,11 @@
+class Test {
+  RIGHT(
+      aaaaaaaaaa,
+      bbbbbbbbbb,
+      cccccccccc,
+      dddddddddd
+  ) {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-function-in-class-3 b/t/t4018/javascript-function-in-class-3
new file mode 100644
index 0000000000..e9b20728b2
--- /dev/null
+++ b/t/t4018/javascript-function-in-class-3
@@ -0,0 +1,10 @@
+class Test {
+  RIGHT(aaaaaaaaaa,
+      bbbbbbbbbb,
+      cccccccccc,
+      dddddddddd
+  ) {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-function-in-object-literal b/t/t4018/javascript-function-in-object-literal
new file mode 100644
index 0000000000..021cc706dd
--- /dev/null
+++ b/t/t4018/javascript-function-in-object-literal
@@ -0,0 +1,7 @@
+const obj = {
+    RIGHT: function (elems, callback, arg) {
+        var length, value;
+        // ...
+        return ChangeMe
+    }
+}
diff --git a/t/t4018/javascript-generator-function b/t/t4018/javascript-generator-function
new file mode 100644
index 0000000000..dc7793939f
--- /dev/null
+++ b/t/t4018/javascript-generator-function
@@ -0,0 +1,4 @@
+function* RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-generator-function-2 b/t/t4018/javascript-generator-function-2
new file mode 100644
index 0000000000..950676a612
--- /dev/null
+++ b/t/t4018/javascript-generator-function-2
@@ -0,0 +1,4 @@
+function *RIGHT(a, b) {
+  
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-getter-function-in-class b/t/t4018/javascript-getter-function-in-class
new file mode 100644
index 0000000000..9a5aee39f7
--- /dev/null
+++ b/t/t4018/javascript-getter-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  get RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-setter-function-in-class b/t/t4018/javascript-setter-function-in-class
new file mode 100644
index 0000000000..dc5f288665
--- /dev/null
+++ b/t/t4018/javascript-setter-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  set RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-skip-chained-function b/t/t4018/javascript-skip-chained-function
new file mode 100644
index 0000000000..4ce23901cf
--- /dev/null
+++ b/t/t4018/javascript-skip-chained-function
@@ -0,0 +1,5 @@
+function RIGHT(a, b) {
+
+  expect(spy.calls.count()).toBe(1)
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-skip-function-call-statement b/t/t4018/javascript-skip-function-call-statement
new file mode 100644
index 0000000000..321993c27e
--- /dev/null
+++ b/t/t4018/javascript-skip-function-call-statement
@@ -0,0 +1,7 @@
+class Test {
+  static RIGHT() {
+    haha();
+    haha2()
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4018/javascript-skip-keywords b/t/t4018/javascript-skip-keywords
new file mode 100644
index 0000000000..5584970b58
--- /dev/null
+++ b/t/t4018/javascript-skip-keywords
@@ -0,0 +1,34 @@
+function RIGHT(a, b) {
+  import("./async1")
+  if (a > 1) {
+    // ...
+  }
+  do {
+    // ...
+  } while (i < 5);
+  for (const element of array1) {
+    console.log(element)
+  }
+  with(o) {
+    console.log(x)
+  }
+  switch (expr) {
+    case 'a':
+      // ...
+      break;
+    case 'b':
+      // ...
+      break;
+    default:
+      // ...
+  }
+  try {
+    // ...
+    return (a + c)
+  } 
+  catch (error) {
+    // ...
+  }
+
+  return a + b; // ChangeMe
+}
diff --git a/t/t4018/javascript-static-function-in-class b/t/t4018/javascript-static-function-in-class
new file mode 100644
index 0000000000..fbf0b7ca3d
--- /dev/null
+++ b/t/t4018/javascript-static-function-in-class
@@ -0,0 +1,6 @@
+class Test {
+  static RIGHT() {
+    let a = 4;
+    let b = ChangeMe;
+  }
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index d5abcf4b4c..33073edeca 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -324,6 +324,7 @@ test_language_driver dts
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
+test_language_driver javascript
 test_language_driver matlab
 test_language_driver objc
 test_language_driver pascal
diff --git a/t/t4034/javascript/expect b/t/t4034/javascript/expect
new file mode 100644
index 0000000000..419d61903b
--- /dev/null
+++ b/t/t4034/javascript/expect
@@ -0,0 +1,54 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 18f4796..46f9b62 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,33 +1,33 @@<RESET>
+// DecimalLiteral<RESET>
+<RED>123<RESET>
+<RED>0.123<RESET>
+<RED>.123<RESET>
+<RED>0.123e+5<RESET>
+<RED>0.123E+5<RESET>
+<RED>0.123e5<RESET>
+<RED>1222222222222222223334444n<RESET><GREEN>124<RESET>
+<GREEN>0.124<RESET>
+<GREEN>.124<RESET>
+<GREEN>0.123e-5<RESET>
+<GREEN>0.123E-5<RESET>
+<GREEN>0.123E5<RESET>
+<GREEN>12222222222222222233344445n<RESET>
+// HexIntegerLiteral<RESET>
+<RED>0x10<RESET>
+<RED>0X6Fa1<RESET>
+<RED>0x123_456<RESET>
+<RED>0x1234182989812f1289an<RESET><GREEN>0x11<RESET>
+<GREEN>0X5Fa1<RESET>
+<GREEN>0x123_756<RESET>
+<GREEN>0x1234182989812f1289bn<RESET>
+// OctalIntegerLiteral<RESET>
+<RED>05<RESET>
+<RED>0o6<RESET>
+<RED>0O7<RESET>
+<RED>0512_567<RESET>
+<RED>0o424242424242424242424242424242666666n<RESET><GREEN>06<RESET>
+<GREEN>0o5<RESET>
+<GREEN>0O4<RESET>
+<GREEN>0511_567<RESET>
+<GREEN>0o424242424242424242424242424242666667n<RESET>
+// BinaryIntegerLiteral<RESET>
+<RED>0b1001<RESET>
+<RED>0B0110<RESET>
+<RED>0b0001_1001_0011<RESET>
+<RED>0b1111111111111111111111111111111111111n<RESET><GREEN>0b1101<RESET>
+<GREEN>0B0010<RESET>
+<GREEN>0b0001_1101_0011<RESET>
+<GREEN>0b11111111111111000011111111111111111n<RESET>
+// punctuations<RESET>
+{<RED>a<RESET><GREEN>b<RESET>} (<RED>a<RESET><GREEN>b<RESET>)
+<RED>a<RESET><GREEN>b<RESET>;
+[<RED>1,<RESET>2<GREEN>,3<RESET>]
+[<RED>1, 2,<RESET> ...<RED>params<RESET><GREEN>params_v2<RESET> ]
+a<RED><=<RESET><GREEN>=<RESET>2 a<RED>>=<RESET><GREEN>=<RESET>2 a<RED>==<RESET><GREEN>=<RESET>2 a<RED>!=<RESET><GREEN>=<RESET>2 a<RED>===<RESET><GREEN>=<RESET>2 a<RED>!==<RESET><GREEN>=<RESET>2 a<RED>^=<RESET><GREEN>=<RESET>2 a<RED>=><RESET><GREEN>=<RESET>2
+a<RED>+=<RESET><GREEN>-=<RESET>b a<RED>*=<RESET><GREEN>%=<RESET>b a<RED>**=<RESET><GREEN>&&=<RESET>b a<RED>||=<RESET><GREEN>|=<RESET>b
+b<RED>+<RESET><GREEN>-<RESET>c a<RED>--<RESET><GREEN>++<RESET> a<RED>>><RESET><GREEN><<<RESET>b a<RED>>>><RESET><GREEN>>>>=<RESET>b a<RED>>>=<RESET><GREEN><<=<RESET>b
+a<RED>&&<RESET><GREEN>&<RESET>b a<RED>||<RESET><GREEN>|<RESET>b a<RED>&&=<RESET><GREEN>??=<RESET>b
diff --git a/t/t4034/javascript/post b/t/t4034/javascript/post
new file mode 100644
index 0000000000..46f9b627e4
--- /dev/null
+++ b/t/t4034/javascript/post
@@ -0,0 +1,33 @@
+// DecimalLiteral
+124
+0.124
+.124
+0.123e-5
+0.123E-5
+0.123E5
+12222222222222222233344445n
+// HexIntegerLiteral
+0x11
+0X5Fa1
+0x123_756
+0x1234182989812f1289bn
+// OctalIntegerLiteral
+06
+0o5
+0O4
+0511_567
+0o424242424242424242424242424242666667n
+// BinaryIntegerLiteral
+0b1101
+0B0010
+0b0001_1101_0011
+0b11111111111111000011111111111111111n
+// punctuations
+{b} (b)
+b;
+[2,3]
+[ ...params_v2 ]
+a=2 a=2 a=2 a=2 a=2 a=2 a=2 a=2
+a-=b a%=b a&&=b a|=b
+b-c a++ a<<b a>>>=b a<<=b
+a&b a|b a??=b
diff --git a/t/t4034/javascript/pre b/t/t4034/javascript/pre
new file mode 100644
index 0000000000..18f479688c
--- /dev/null
+++ b/t/t4034/javascript/pre
@@ -0,0 +1,33 @@
+// DecimalLiteral
+123
+0.123
+.123
+0.123e+5
+0.123E+5
+0.123e5
+1222222222222222223334444n
+// HexIntegerLiteral
+0x10
+0X6Fa1
+0x123_456
+0x1234182989812f1289an
+// OctalIntegerLiteral
+05
+0o6
+0O7
+0512_567
+0o424242424242424242424242424242666666n
+// BinaryIntegerLiteral
+0b1001
+0B0110
+0b0001_1001_0011
+0b1111111111111111111111111111111111111n
+// punctuations
+{a} (a)
+a;
+[1,2]
+[ 1, 2, ...params ]
+a<=2 a>=2 a==2 a!=2 a===2 a!==2 a^=2 a=>2
+a+=b a*=b a**=b a||=b
+b+c a-- a>>b a>>>b a>>=b
+a&&b a||b a&&=b
diff --git a/userdiff.c b/userdiff.c
index 8578cb0d12..306286df35 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -168,6 +168,43 @@ PATTERNS("java",
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
+
+PATTERNS("javascript",
+	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
+	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
+	 /* don't match statement */
+	 "!;\n"
+	 /* match normal function or named export for function in ECMA2015 */
+	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
+	 /* match JavaScript variable declaration with a lambda expression at top level */
+	 "^((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
+		"(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
+	 /* match object's property assignment by anonymous function and CommonJS exports for named function */
+	 "^((module\\.)?[$_[:alpha:]][$_[:alnum:]]*\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(async[\t ]+)?(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
+	 /* match assign function to LHS with explicit function keyword */
+	 "^(.*=[\t ]*function[\t ]*([$_[:alnum:]]+[\t ]*)?\\(.*)\n"
+	 /* popular unit testing framework test case pattern. Most of framework pattern is match by regex for "function in class" */
+	 "^[\t ]*(QUnit.test\\(.*)\n"
+	 /* don't match the function in class or in object literal, which has more than one ident level */
+	 "!^(\t{2,}|[ ]{5,})\n"
+	 /* match normal function in object literal */
+	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function.*)\n"
+	 /* don't match chained method call */
+	 "!^[\t ]*[$_[:alpha:]][$_[:alnum:]][\t ]*\\(.*\\)\\.\n"
+	 /* match function in class and ES5 method shorthand */
+	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",
+	 /* word regex */
+	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and its big version */
+	 "0[xXoObB][_0-9a-fA-F]+n?"
+	 /* DecimalLiteral and its big version*/
+	 "|[0-9][_0-9]*(\\.[0-9][_0-9]*|n)?([eE][+-]?[_0-9]+)?"
+	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
+	 /* punctuations */
+	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
+	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
+	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
+	 /* identifiers */
+	 "|[$_[:alpha:]][$_[:alnum:]]*"),
 PATTERNS("markdown",
 	 "^ {0,3}#{1,6}[ \t].*",
 	 /* -- */
-- 
2.35.1.273.ge6ebfd0e8c.dirty


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v3] Add a diff driver for JavaScript languages.
  2022-04-03 13:25 ` [GSoC][PATCH v3] " xing zhi jiang
@ 2022-04-03 14:40   ` Johannes Sixt
  2022-04-04  7:12   ` Ævar Arnfjörð Bjarmason
  2022-04-04 17:32   ` Glen Choo
  2 siblings, 0 replies; 25+ messages in thread
From: Johannes Sixt @ 2022-04-03 14:40 UTC (permalink / raw)
  To: xing zhi jiang; +Cc: chooglen, git

Am 03.04.22 um 15:25 schrieb xing zhi jiang:
> In the xfunction part that matches normal functions,
> a variable declaration with an assignment of function, the function declaration
> in the class, named export for function in ECMA2015, CommonJS exports for
> named function, and also the function is object literal's property[1].
> The special match is this would match the popular framework test case
> pattern(most is same as regex for matching function in class).
> 
> And in the word regex part, that matches numbers, punctuations, and also the
> JavaScript identifier.
> This part reference the formal ECMA specification[2].
> 
> [1]https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
> [2]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
> 
> Signed-off-by: xing zhi jiang <a97410985new@gmail.com>
> ---
> I write a simple python script to help me observe the diff's results easier. 
> And the result is in https://github.com/a97410985/user_diff_observe_tool.
> I observe the six popular JavaScript projects - Jquery, NodeJS, React, TensorflowJS, 
> Vue, and Webpack. The result is not perfect but acceptable.
> Most of the mismatch is related to the function in class. Ex:
> https://github.com/a97410985/user_diff_observe_tool/blob/092bd14c849b9a3f61bb872d730e38b58cc9c303/diff_result/jquery_result.diff#L1094.
> But this is inevitable.
> And besides, apply the code reviews suggestions. I add some new regex.
> 1. "!^[\t ]*[$_[:alpha:]][$_[:alnum:]][\t ]*\\(.*\\)\\.\n"
>    this is for the case below
>    `
>     function RIGHT(a, b) {
> 
>         WRONG(spy.calls.count()).toBe(1)
>         return a + b; // ChangeMe
>     }
>    `
>    if we don't skip chained function call. This would happen mismatch.

OK, but see below.

> 2. "^[\t ]*(QUnit.test\\(.*)\n"
>    I oberve there are many test case pattern in JavaScript. And most of popular JavaScript
>    unit test framework's test case pattern is function call can be matched by the regex, which
>    match "function in class". So it is no problem. But `QUnit framework` don't matched by that.
>    Then add this regex into JavaScript userdiff.
>    And the reason includes test case pattern has two
>    1. This is inevitable. Because we can't distinguish that with function in class.
>    2. This can help the user understand what change is happening in which test case. And if 
>       don't match the test case, it has the large possibility to match no related function.

Good. I consider this like matching a language keyword. If additional
patterns of this kind occur, they can be added later, but we must make
sure that this does not proliferate.

>       
> And I move the regex, which match normal function in object literal, under the regex 
> ""!^(\t{2,}|[ ]{5,})\n".
> Because I found it would many mismatch if function in object literal has too many indent level(
> the function body usually just few lines of code, and the below has large of code. If has change
> on below, it would happen mismatch).
> And there are two example for hunk header that match test case pattern.

OK.

> diff --git a/t/t4018/javascript-skip-chained-function b/t/t4018/javascript-skip-chained-function
> new file mode 100644
> index 0000000000..4ce23901cf
> --- /dev/null
> +++ b/t/t4018/javascript-skip-chained-function
> @@ -0,0 +1,5 @@
> +function RIGHT(a, b) {
> +
> +  expect(spy.calls.count()).toBe(1)
> +  return a + b; // ChangeMe
> +}

This is the test case that attempts to make sure that the chained
function calls are not matched. But since there is no line between
ChangeMe and the not-to-be-matched line, this test case would not detect
such an incorrect match. You have to insert a line between expect... and
return...ChangeMe

> diff --git a/userdiff.c b/userdiff.c
> index 8578cb0d12..306286df35 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -168,6 +168,43 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +
> +PATTERNS("javascript",
> +	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
> +	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
> +	 /* don't match statement */
> +	 "!;\n"
> +	 /* match normal function or named export for function in ECMA2015 */
> +	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
> +	 /* match JavaScript variable declaration with a lambda expression at top level */
> +	 "^((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> +		"(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
> +	 /* match object's property assignment by anonymous function and CommonJS exports for named function */
> +	 "^((module\\.)?[$_[:alpha:]][$_[:alnum:]]*\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(async[\t ]+)?(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
> +	 /* match assign function to LHS with explicit function keyword */
> +	 "^(.*=[\t ]*function[\t ]*([$_[:alnum:]]+[\t ]*)?\\(.*)\n"
> +	 /* popular unit testing framework test case pattern. Most of framework pattern is match by regex for "function in class" */
> +	 "^[\t ]*(QUnit.test\\(.*)\n"
> +	 /* don't match the function in class or in object literal, which has more than one ident level */
> +	 "!^(\t{2,}|[ ]{5,})\n"
> +	 /* match normal function in object literal */
> +	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function.*)\n"
> +	 /* don't match chained method call */
> +	 "!^[\t ]*[$_[:alpha:]][$_[:alnum:]][\t ]*\\(.*\\)\\.\n"
> +	 /* match function in class and ES5 method shorthand */
> +	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",

No objections to these expressions.

> +	 /* word regex */
> +	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and its big version */
> +	 "0[xXoObB][_0-9a-fA-F]+n?"
> +	 /* DecimalLiteral and its big version*/
> +	 "|[0-9][_0-9]*(\\.[0-9][_0-9]*|n)?([eE][+-]?[_0-9]+)?"
> +	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
> +	 /* punctuations */
> +	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
> +	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
> +	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
> +	 /* identifiers */
> +	 "|[$_[:alpha:]][$_[:alnum:]]*"),

These word-regexs look good.

>  PATTERNS("markdown",
>  	 "^ {0,3}#{1,6}[ \t].*",
>  	 /* -- */

This round looks quite good. With the one issue I noted above fixed,
this should be good to go, IMO.

-- Hannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v3] Add a diff driver for JavaScript languages.
  2022-04-03 13:25 ` [GSoC][PATCH v3] " xing zhi jiang
  2022-04-03 14:40   ` Johannes Sixt
@ 2022-04-04  7:12   ` Ævar Arnfjörð Bjarmason
  2022-04-04 20:29     ` Johannes Sixt
  2022-04-04 17:32   ` Glen Choo
  2 siblings, 1 reply; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-04-04  7:12 UTC (permalink / raw)
  To: xing zhi jiang; +Cc: git, j6t, chooglen


On Sun, Apr 03 2022, xing zhi jiang wrote:

Aside from what Johannes Sixt mentioned:

> +PATTERNS("javascript",
> +	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
> +	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
> +	 /* don't match statement */
> +	 "!;\n"
> +	 /* match normal function or named export for function in ECMA2015 */
> +	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
> +	 /* match JavaScript variable declaration with a lambda expression at top level */
> +	 "^((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> +		"(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
> +	 /* match object's property assignment by anonymous function and CommonJS exports for named function */
> +	 "^((module\\.)?[$_[:alpha:]][$_[:alnum:]]*\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(async[\t ]+)?(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
> +	 /* match assign function to LHS with explicit function keyword */
> +	 "^(.*=[\t ]*function[\t ]*([$_[:alnum:]]+[\t ]*)?\\(.*)\n"
> +	 /* popular unit testing framework test case pattern. Most of framework pattern is match by regex for "function in class" */
> +	 "^[\t ]*(QUnit.test\\(.*)\n"
> +	 /* don't match the function in class or in object literal, which has more than one ident level */
> +	 "!^(\t{2,}|[ ]{5,})\n"
> +	 /* match normal function in object literal */
> +	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function.*)\n"
> +	 /* don't match chained method call */
> +	 "!^[\t ]*[$_[:alpha:]][$_[:alnum:]][\t ]*\\(.*\\)\\.\n"
> +	 /* match function in class and ES5 method shorthand */
> +	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",
> +	 /* word regex */
> +	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and its big version */
> +	 "0[xXoObB][_0-9a-fA-F]+n?"
> +	 /* DecimalLiteral and its big version*/
> +	 "|[0-9][_0-9]*(\\.[0-9][_0-9]*|n)?([eE][+-]?[_0-9]+)?"
> +	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
> +	 /* punctuations */
> +	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
> +	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
> +	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
> +	 /* identifiers */
> +	 "|[$_[:alpha:]][$_[:alnum:]]*"),
>  PATTERNS("markdown",
>  	 "^ {0,3}#{1,6}[ \t].*",
>  	 /* -- */<

While we don't use helper macros for these currently there's no reason
we can't, I thin the above might be more readable with e.g.:

	#define JS_AA "[$_[:alpha:]][$_[:alnum:]]"

Which would make this:
	
	+PATTERNS("javascript",
	+	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
	+	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
	+	 /* don't match statement */
	+	 "!;\n"
	+	 /* match normal function or named export for function in ECMA2015 */
	+	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*" JS_AA "*[\t ]*\\(.*)\n"
	+	 /* match JavaScript variable declaration with a lambda expression at top level */
	+	 "^((const|let|var)[\t ]*" JS_AA "*[\t ]*=[\t ]*"
	+		"(\\(.*\\)|" JS_AA "*)[\t ]*=>[\t ]*\\{?)\n"
	+	 /* match object's property assignment by anonymous function and CommonJS exports for named function */
	+	 "^((module\\.)?" JS_AA "*\\." JS_AA "*[\t ]*=[\t ]*(async[\t ]+)?(\\(.*\\)|" JS_AA "*)[\t ]*=>.*)\n"
	+	 /* match assign function to LHS with explicit function keyword */
	+	 "^(.*=[\t ]*function[\t ]*([$_[:alnum:]]+[\t ]*)?\\(.*)\n"
	+	 /* popular unit testing framework test case pattern. Most of framework pattern is match by regex for "function in class" */

Wry try to stick to wrapping at 80 characters, so some of these comments
should really be wrapped (see CodingGuidelines for the multi-line
comment style we use).

	+	 "^[\t ]*(QUnit.test\\(.*)\n"
	+	 /* don't match the function in class or in object literal, which has more than one ident level */
	+	 "!^(\t{2,}|[ ]{5,})\n"
	+	 /* match normal function in object literal */
	+	 "^[\t ]*(" JS_AA "*[\t ]*:[\t ]*function.*)\n"
	+	 /* don't match chained method call */
	+	 "!^[\t ]*" JS_AA "[\t ]*\\(.*\\)\\.\n"
	+	 /* match function in class and ES5 method shorthand */
	+	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?" JS_AA "*[\t ]*\\(.*)",
	+	 /* word regex */
	+	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and its big version */
	+	 "0[xXoObB][_0-9a-fA-F]+n?"
	+	 /* DecimalLiteral and its big version*/
	+	 "|[0-9][_0-9]*(\\.[0-9][_0-9]*|n)?([eE][+-]?[_0-9]+)?"
	+	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
	+	 /* punctuations */
	+	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
	+	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
	+	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
	+	 /* identifiers */
	+	 "|" JS_AA "*"),
	
Just a thought, I wonder how much line-noisy we could make this thing in
general if we defined some common patterns with such helpers.

Anyway, insted of :alnum:and :alpha: don't you really mean [a-zA-Z0-9]
and [a-zA-Z]. I.e. do you *really* want to have this different depending
on the user's locale?

I haven't tested, but see the LC_CTYPE in gettext.c, so I'm fairly sure
that'll happen...


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v3] Add a diff driver for JavaScript languages.
  2022-04-03 13:25 ` [GSoC][PATCH v3] " xing zhi jiang
  2022-04-03 14:40   ` Johannes Sixt
  2022-04-04  7:12   ` Ævar Arnfjörð Bjarmason
@ 2022-04-04 17:32   ` Glen Choo
  2 siblings, 0 replies; 25+ messages in thread
From: Glen Choo @ 2022-04-04 17:32 UTC (permalink / raw)
  To: xing zhi jiang, git; +Cc: j6t


I'm pretty happy to see follow up on this patch, I think it'll be very
useful. Thanks! I think this patch has a fairly consistent direction
now; though I'm undecided on whether or not I agree with the direction,
I won't block this from going through.

So treat these comments as suggestions and additional JS info for the
folks who actually matter :)

xing zhi jiang <a97410985new@gmail.com> writes:

> In the xfunction part that matches normal functions,
> a variable declaration with an assignment of function, the function declaration
> in the class, named export for function in ECMA2015, CommonJS exports for
> named function, and also the function is object literal's property[1].
> The special match is this would match the popular framework test case
> pattern(most is same as regex for matching function in class).

With this paragraph and..

> 2. "^[\t ]*(QUnit.test\\(.*)\n"

this pattern, it looks like the direction we've chosen to take is
"support popular libraries, even if this might detect the wrong thing in
vanilla JS", which sounds reasonable enough to me. I think the
experience would be a bit inconsistent for vanilla JS folks, but the
worst thing I can imagine is that we are a bit overeager to detect
headers.

As an aside, this is my first time hearing of QUnit, but I guess the JS
landscape is huge and changes too often for me to keep track.

>    I oberve there are many test case pattern in JavaScript. And most of popular JavaScript
>    unit test framework's test case pattern is function call can be matched by the regex, which
>    match "function in class". So it is no problem. But `QUnit framework` don't matched by that.
>    Then add this regex into JavaScript userdiff.
>    And the reason includes test case pattern has two
>    1. This is inevitable. Because we can't distinguish that with function in class.
>    2. This can help the user understand what change is happening in which test case. And if 
>       don't match the test case, it has the large possibility to match no related function.

Nit: since we're adding a pattern just for QUnit, it would be nice to
have a test for it. This would also have the nice benefit of showing why
QUnit can't make use of the other patterns.

> +	 /* popular unit testing framework test case pattern. Most of framework pattern is match by regex for "function in class" */
> +	 "^[\t ]*(QUnit.test\\(.*)\n"

This wasn't entirely clear to me on its own, but reading your ---
description, I believe you mean that other unit testing frameworks are
covered by another regex.

Wording suggestion, perhaps:

  /*
   * QUnit (popular unit testing framework) test case; most of the other
   * frameworks are matched by "function in class".
   */

As a technicality, there is no guarantee that the user will import the
QUnit library as the variable `QUnit`, though most users will probably
use `QUnit` out of convention, so I don't think this a problem [1].

[1] In case some additional context becomes useful to other reviewers:
 most JavaScript module systems do not enforce variable names when
 importing a module, so there is no guarantee that QUnit will be
 imported as the token 'QUnit', though this is often true by
 _convention_ because library authors often stick to a single name in
 their documentation.
 
 In 'old school JavaScript in the browser', libraries are often exported
 as single global variables. Those do have well-enforced names, because
 otherwise the globals would be unusable.
 
 In native ES modules (aka standardized, modern JavaScript supported by
 modern browsers and most frameworks), the library author can suggest a
 name using named exports, but with the caveats 1. the importer can
 alias the variable to a different name and 2. many authors use
 anonymous exports - either using a default export or asking users to
 import the entire library as a single, anonymous object.
 
 In CommonJS (typically used in NodeJS), one cannot name exports IIRC;
 an export is always a single, anonymous object.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v3] Add a diff driver for JavaScript languages.
  2022-04-04  7:12   ` Ævar Arnfjörð Bjarmason
@ 2022-04-04 20:29     ` Johannes Sixt
  2022-04-04 21:44       ` Junio C Hamano
  2022-04-05  2:22       ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 25+ messages in thread
From: Johannes Sixt @ 2022-04-04 20:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, xing zhi jiang; +Cc: git, chooglen

Am 04.04.22 um 09:12 schrieb Ævar Arnfjörð Bjarmason:
> While we don't use helper macros for these currently there's no reason
> we can't, I thin the above might be more readable with e.g.:
> 
> 	#define JS_AA "[$_[:alpha:]][$_[:alnum:]]"

Please consider including "identifier" somehow in the macro name. And
add the trailing '*', which...

> Which would make this:
> 	
> 	+PATTERNS("javascript",
> 	+	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
> 	+	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
> 	+	 /* don't match statement */
> 	+	 "!;\n"
> 	+	 /* match normal function or named export for function in ECMA2015 */
> 	+	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*" JS_AA "*[\t ]*\\(.*)\n"
> 	+	 /* match JavaScript variable declaration with a lambda expression at top level */
> 	+	 "^((const|let|var)[\t ]*" JS_AA "*[\t ]*=[\t ]*"
> 	+		"(\\(.*\\)|" JS_AA "*)[\t ]*=>[\t ]*\\{?)\n"
> 	+	 /* match object's property assignment by anonymous function and CommonJS exports for named function */
> 	+	 "^((module\\.)?" JS_AA "*\\." JS_AA "*[\t ]*=[\t ]*(async[\t ]+)?(\\(.*\\)|" JS_AA "*)[\t ]*=>.*)\n"
> 	+	 /* match assign function to LHS with explicit function keyword */
> 	+	 "^(.*=[\t ]*function[\t ]*([$_[:alnum:]]+[\t ]*)?\\(.*)\n"
> 	+	 /* popular unit testing framework test case pattern. Most of framework pattern is match by regex for "function in class" */
> 
> Wry try to stick to wrapping at 80 characters, so some of these comments
> should really be wrapped (see CodingGuidelines for the multi-line
> comment style we use).
> 
> 	+	 "^[\t ]*(QUnit.test\\(.*)\n"
> 	+	 /* don't match the function in class or in object literal, which has more than one ident level */
> 	+	 "!^(\t{2,}|[ ]{5,})\n"
> 	+	 /* match normal function in object literal */
> 	+	 "^[\t ]*(" JS_AA "*[\t ]*:[\t ]*function.*)\n"
> 	+	 /* don't match chained method call */
> 	+	 "!^[\t ]*" JS_AA "[\t ]*\\(.*\\)\\.\n"

... which makes me wonder why it is not present here. If that's an
oversight: nice catch!

> 	+	 /* match function in class and ES5 method shorthand */
> 	+	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?" JS_AA "*[\t ]*\\(.*)",
> 	+	 /* word regex */
> 	+	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and its big version */
> 	+	 "0[xXoObB][_0-9a-fA-F]+n?"
> 	+	 /* DecimalLiteral and its big version*/
> 	+	 "|[0-9][_0-9]*(\\.[0-9][_0-9]*|n)?([eE][+-]?[_0-9]+)?"
> 	+	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
> 	+	 /* punctuations */
> 	+	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
> 	+	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
> 	+	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
> 	+	 /* identifiers */
> 	+	 "|" JS_AA "*"),
> 	
> Just a thought, I wonder how much line-noisy we could make this thing in
> general if we defined some common patterns with such helpers.
> 
> Anyway, insted of :alnum:and :alpha: don't you really mean [a-zA-Z0-9]
> and [a-zA-Z]. I.e. do you *really* want to have this different depending
> on the user's locale?

That's worth considering.

> 
> I haven't tested, but see the LC_CTYPE in gettext.c, so I'm fairly sure
> that'll happen...
> 

-- Hannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v3] Add a diff driver for JavaScript languages.
  2022-04-04 20:29     ` Johannes Sixt
@ 2022-04-04 21:44       ` Junio C Hamano
  2022-04-05  2:22       ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2022-04-04 21:44 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Ævar Arnfjörð Bjarmason, xing zhi jiang, git,
	chooglen

Johannes Sixt <j6t@kdbg.org> writes:

> Am 04.04.22 um 09:12 schrieb Ævar Arnfjörð Bjarmason:
>> While we don't use helper macros for these currently there's no reason
>> we can't, I thin the above might be more readable with e.g.:
>> 
>> 	#define JS_AA "[$_[:alpha:]][$_[:alnum:]]"
>
> Please consider including "identifier" somehow in the macro name. And
> add the trailing '*', which...
>
> ... which makes me wonder why it is not present here. If that's an
> oversight: nice catch!

Yeah, and taken together with the locale worry, we shouldn't play
cute and use :alpha: and :alnum:, but spell them out, i.e.

	#define JS_IDENT "[$_a-zA-Z][$_a-zA-Z0-9]"

probably.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [GSoC][PATCH v3] Add a diff driver for JavaScript languages.
  2022-04-04 20:29     ` Johannes Sixt
  2022-04-04 21:44       ` Junio C Hamano
@ 2022-04-05  2:22       ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-04-05  2:22 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: xing zhi jiang, git, chooglen


On Mon, Apr 04 2022, Johannes Sixt wrote:

> Am 04.04.22 um 09:12 schrieb Ævar Arnfjörð Bjarmason:
>> While we don't use helper macros for these currently there's no reason
>> we can't, I thin the above might be more readable with e.g.:
>> 
>> 	#define JS_AA "[$_[:alpha:]][$_[:alnum:]]"
>
> Please consider including "identifier" somehow in the macro name. And
> add the trailing '*', which...

Indeed, although for something like this a cute short name is probably
OK, and we can just #undef it right afer.

>> Which would make this:
>> 	
>> 	+PATTERNS("javascript",
>> 	+	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
>> 	+	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
>> 	+	 /* don't match statement */
>> 	+	 "!;\n"
>> 	+	 /* match normal function or named export for function in ECMA2015 */
>> 	+	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*" JS_AA "*[\t ]*\\(.*)\n"
>> 	+	 /* match JavaScript variable declaration with a lambda expression at top level */
>> 	+	 "^((const|let|var)[\t ]*" JS_AA "*[\t ]*=[\t ]*"
>> 	+		"(\\(.*\\)|" JS_AA "*)[\t ]*=>[\t ]*\\{?)\n"
>> 	+	 /* match object's property assignment by anonymous function and CommonJS exports for named function */
>> 	+	 "^((module\\.)?" JS_AA "*\\." JS_AA "*[\t ]*=[\t ]*(async[\t ]+)?(\\(.*\\)|" JS_AA "*)[\t ]*=>.*)\n"
>> 	+	 /* match assign function to LHS with explicit function keyword */
>> 	+	 "^(.*=[\t ]*function[\t ]*([$_[:alnum:]]+[\t ]*)?\\(.*)\n"
>> 	+	 /* popular unit testing framework test case pattern. Most of framework pattern is match by regex for "function in class" */
>> 
>> Wry try to stick to wrapping at 80 characters, so some of these comments
>> should really be wrapped (see CodingGuidelines for the multi-line
>> comment style we use).
>> 
>> 	+	 "^[\t ]*(QUnit.test\\(.*)\n"
>> 	+	 /* don't match the function in class or in object literal, which has more than one ident level */
>> 	+	 "!^(\t{2,}|[ ]{5,})\n"
>> 	+	 /* match normal function in object literal */
>> 	+	 "^[\t ]*(" JS_AA "*[\t ]*:[\t ]*function.*)\n"
>> 	+	 /* don't match chained method call */
>> 	+	 "!^[\t ]*" JS_AA "[\t ]*\\(.*\\)\\.\n"
>
> ... which makes me wonder why it is not present here. If that's an
> oversight: nice catch!

*Nod*, I just did a dumb search replace and didn't notice that myself,
 but it's clearly making things easier to read.

Asanother thing I noticed: shouldn't that '.' in QUnit.test be escaped?
Presumably we don't want QUnitXtest or whatever.

>> 	+	 /* match function in class and ES5 method shorthand */
>> 	+	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?" JS_AA "*[\t ]*\\(.*)",
>> 	+	 /* word regex */
>> 	+	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and its big version */
>> 	+	 "0[xXoObB][_0-9a-fA-F]+n?"
>> 	+	 /* DecimalLiteral and its big version*/
>> 	+	 "|[0-9][_0-9]*(\\.[0-9][_0-9]*|n)?([eE][+-]?[_0-9]+)?"
>> 	+	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
>> 	+	 /* punctuations */
>> 	+	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
>> 	+	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
>> 	+	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
>> 	+	 /* identifiers */
>> 	+	 "|" JS_AA "*"),
>> 	
>> Just a thought, I wonder how much line-noisy we could make this thing in
>> general if we defined some common patterns with such helpers.
>> 
>> Anyway, insted of :alnum:and :alpha: don't you really mean [a-zA-Z0-9]
>> and [a-zA-Z]. I.e. do you *really* want to have this different depending
>> on the user's locale?
>
> That's worth considering.

If it's intentional it makes sense to add some locale-stressing tests to
the tests, i.e. if we really mean to match non-ASCII identifiers etc.

>> 
>> I haven't tested, but see the LC_CTYPE in gettext.c, so I'm fairly sure
>> that'll happen...
>> 
>
> -- Hannes


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-04-05  2:48 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-04 13:08 [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language xing zhi jiang
2022-03-04 13:08 ` [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages xing zhi jiang
2022-03-05 10:16   ` Johannes Sixt
2022-03-07 15:10     ` xing-zhi jiang
2022-03-08  6:46       ` Johannes Sixt
2022-03-12 16:59         ` xing zhi jiang
2022-03-05 13:41 ` [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language Johannes Sixt
2022-03-12 16:48 ` [GSoC][PATCH v2] Add a diff driver for JavaScript languages xing zhi jiang
2022-03-13 21:54   ` Johannes Sixt
2022-04-03 13:17     ` xing zhi jiang
2022-03-14 17:20   ` Glen Choo
2022-03-15  7:40     ` Johannes Sixt
2022-03-15 18:51       ` Glen Choo
2022-03-15 19:22         ` Junio C Hamano
2022-03-15 21:34           ` Glen Choo
2022-04-03 13:24             ` xing zhi jiang
2022-04-03 13:20         ` xing zhi jiang
2022-04-03 13:21     ` xing zhi jiang
2022-04-03 13:25 ` [GSoC][PATCH v3] " xing zhi jiang
2022-04-03 14:40   ` Johannes Sixt
2022-04-04  7:12   ` Ævar Arnfjörð Bjarmason
2022-04-04 20:29     ` Johannes Sixt
2022-04-04 21:44       ` Junio C Hamano
2022-04-05  2:22       ` Ævar Arnfjörð Bjarmason
2022-04-04 17:32   ` Glen Choo

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).