git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language
@ 2022-03-04 13:08 xing zhi jiang
  2022-03-04 13:08 ` [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages xing zhi jiang
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: xing zhi jiang @ 2022-03-04 13:08 UTC (permalink / raw)
  To: git; +Cc: a97410985new

I have already searched the git public-inbox don't find any active patch about
userdiff build-in driver for JavaScript(there is an unfinished patch about 
three years ago). So I pick this as my GSoC micro project.

Below are typical function patterns from JavaScript, and 
also has an example that matches the corresponding pattern. 
These examples come from many popular JavaScript projects. 
Because I want to make sure the hunk header would work well 
on real-world projects.

Common function's pattern for JavaScript
1. normal function
  `^[\t ]*((export[\t ]+)?((async|get|set)[\t ]+)?function[\t ]*([\t ]*\\*[\t ]*|[\t ]*)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)`
  example: 
  1. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L648
  2. https://github.com/mozilla/pdf.js/blob/ad4b2ce021277ff7cea8ec7e32775c65d74ee673/test/unit/evaluator_spec.js#L40
2. JavaScript variable declaration with a lambda expression
  `^^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*[\t ])[\t ]*=>[\t ]*\\{?)`
   example:
   1. https://github.com/webpack/webpack/blob/2279c5a2105ea1498b83a4854919aefe1a28c553/lib/ChunkGraph.js#L91
   2. https://github.com/webpack/webpack/blob/2279c5a2105ea1498b83a4854919aefe1a28c553/lib/ChunkGraph.js#L122
    
   I found sometimes would define function on this way. But this should only match the top level? Because 
   it may match inside the function, and the below code would match the wrong function[1].

3. exports methods by assigning an anonymous function
  `^[\t ]*(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)`
   example:
   1. https://github.com/webpack/webpack/blob/c181294865dca01b28e6e316636fef5f2aad4eb6/lib/dependencies/DynamicExports.js#L17
   2. https://github.com/ajaxorg/ace/blob/d95725983b363a616c584237013dfd36eaadbba4/lib/ace/lib/dom.js#L37
4. expression about assign function to LHS
  `^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)`
   example:
   1. https://github.com/ajaxorg/ace/blob/94422a4a892495564c56089af85019a8f8f24673/lib/ace/anchor.js#L102
   2. https://github.com/ajaxorg/ace/blob/d95725983b363a616c584237013dfd36eaadbba4/lib/ace/lib/dom.js#L37
   3. https://github.com/ajaxorg/ace/blob/4257621787b4253d6d493611f4ec5a37829da323/lib/ace/search.js#L350
   4. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L299
   
   Maybe this should only match on the 0,1,2 indent level? Because JavaScript may match the function assignment in another function.
5. normal function in object literal
  `^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)`
    1. https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
    2. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L242
6. function in class
  `^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)`
    
   This regex is tricky because the class's function is no function keyword in JavaScript. 
   If you write the regex to match them, it will match many non-function declaration things!!! 
   Like examples below:
   1. the non-function matches
     1. https://github.com/ajaxorg/ace/blob/94422a4a892495564c56089af85019a8f8f24673/lib/ace/anchor.js#L58
     2. https://github.com/ajaxorg/ace/blob/d95725983b363a616c584237013dfd36eaadbba4/lib/ace/lib/dom.js#L132
   2. the function in class
     1. https://github.com/mozilla/pdf.js/blob/85ff7b117e04471c550914b7a13193ab7f7ecc6a/src/display/canvas.js#L1929
     2. https://github.com/webpack/webpack/blob/ccecc17c01af96edddb931a76e7a3b21ef2969d8/lib/Chunk.js#L179
     3. https://github.com/webpack/webpack/blob/612de998f186a9bb2fe8769a91678df689a0541e/lib/Module.js#L242
     4. https://github.com/mozilla/pdf.js/blob/5cf116a958548f6596674bf8d5ca0fe64aa2df3c/web/view_history.js#L75
     5. https://github.com/mozilla/pdf.js/blob/5cf116a958548f6596674bf8d5ca0fe64aa2df3c/web/view_history.js#L89
   
    My solution is to add some negation rules, and one rule is skipping the keywords that may have characters immediately 
    following them in the parenthesis, rule is `!^[ \t]*(if|do|while|for|with|switch|catch|import|return)`.
    Another negation rule is only before this 「function in class」 regex, that skips the line's indent level more than 
    one because most of the function in class has one indent level(the class is on top-level). The negation rule is 
    `!^(\t{2,}|[ ]{5,})`.
    
    But this is not enough, because maybe has function call on one indent level. So need an negation rule for skipping 
    statement. The negation rule is `!^.*;[ \t]*`. But the bad news is JavaScript's statement can end without a semicolon. 
    So it still has an opportunity to recognize function call as the function declaration if the code's statement does not 
    end with semicolons.

Word's pattern for JavaScript
In this part, I reference the formal ECMA specification heavily[2].
JavaScript has some special syntax, such as numbers can be separated with an underscore for readability[3]. 
And has BigInt literal, which is number end with a 「n」 character. So the number-related regex would be some 
differences with another language.

In the last, I had a fork git project on Github. And has the CI's result, the all test cases pass[4].

[1] https://github.com/webpack/webpack/blob/2279c5a2105ea1498b83a4854919aefe1a28c553/lib/ChunkGraph.js#L279
[2] https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
[3] https://v8.dev/features/numeric-separators
[4] https://github.com/a97410985/git/actions/runs/1933091300

xing zhi jiang (1):
  Add a diff driver for JavaScript languages.

 .gitignore                                    |  1 +
 Documentation/gitattributes.txt               |  2 +
 ...avascript-assignment-of-anonymous-function |  4 ++
 .../javascript-assignment-of-arrow-function   |  4 ++
 .../javascript-assignment-of-named-function   |  4 ++
 t/t4018/javascript-async-function             |  4 ++
 t/t4018/javascript-export-async-function      |  4 ++
 t/t4018/javascript-export-function            |  4 ++
 t/t4018/javascript-exports-anomyous-function  |  4 ++
 .../javascript-exports-anomyous-function-2    |  4 ++
 t/t4018/javascript-exports-function           |  4 ++
 t/t4018/javascript-function                   |  4 ++
 t/t4018/javascript-function-2                 | 10 ++++
 t/t4018/javascript-function-belong-to-IIFE    |  6 +++
 t/t4018/javascript-function-in-class          |  6 +++
 t/t4018/javascript-function-in-class-2        | 11 ++++
 t/t4018/javascript-function-in-object-literal |  7 +++
 t/t4018/javascript-generator-function         |  4 ++
 t/t4018/javascript-generator-function-2       |  4 ++
 t/t4018/javascript-getter-function-in-class   |  6 +++
 t/t4018/javascript-setter-function-in-class   |  6 +++
 .../javascript-skip-function-call-statement   |  7 +++
 t/t4018/javascript-skip-keywords              | 34 ++++++++++++
 t/t4018/javascript-static-function-in-class   |  6 +++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/javascript/expect                     | 52 +++++++++++++++++++
 t/t4034/javascript/post                       | 32 ++++++++++++
 t/t4034/javascript/pre                        | 32 ++++++++++++
 userdiff.c                                    | 38 ++++++++++++++
 29 files changed, 305 insertions(+)
 create mode 100644 t/t4018/javascript-assignment-of-anonymous-function
 create mode 100644 t/t4018/javascript-assignment-of-arrow-function
 create mode 100644 t/t4018/javascript-assignment-of-named-function
 create mode 100644 t/t4018/javascript-async-function
 create mode 100644 t/t4018/javascript-export-async-function
 create mode 100644 t/t4018/javascript-export-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function
 create mode 100644 t/t4018/javascript-exports-anomyous-function-2
 create mode 100644 t/t4018/javascript-exports-function
 create mode 100644 t/t4018/javascript-function
 create mode 100644 t/t4018/javascript-function-2
 create mode 100644 t/t4018/javascript-function-belong-to-IIFE
 create mode 100644 t/t4018/javascript-function-in-class
 create mode 100644 t/t4018/javascript-function-in-class-2
 create mode 100644 t/t4018/javascript-function-in-object-literal
 create mode 100644 t/t4018/javascript-generator-function
 create mode 100644 t/t4018/javascript-generator-function-2
 create mode 100644 t/t4018/javascript-getter-function-in-class
 create mode 100644 t/t4018/javascript-setter-function-in-class
 create mode 100644 t/t4018/javascript-skip-function-call-statement
 create mode 100644 t/t4018/javascript-skip-keywords
 create mode 100644 t/t4018/javascript-static-function-in-class
 create mode 100644 t/t4034/javascript/expect
 create mode 100644 t/t4034/javascript/post
 create mode 100644 t/t4034/javascript/pre


base-commit: e6ebfd0e8cbbd10878070c8a356b5ad1b3ca464e
-- 
2.35.1.273.ge6ebfd0e8c.dirty


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-04-05  2:48 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-04 13:08 [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language xing zhi jiang
2022-03-04 13:08 ` [GSoC][PATCH 1/1] Add a diff driver for JavaScript languages xing zhi jiang
2022-03-05 10:16   ` Johannes Sixt
2022-03-07 15:10     ` xing-zhi jiang
2022-03-08  6:46       ` Johannes Sixt
2022-03-12 16:59         ` xing zhi jiang
2022-03-05 13:41 ` [GSoC][PATCH 0/1] userdiff: add buildin diff driver for JavaScript language Johannes Sixt
2022-03-12 16:48 ` [GSoC][PATCH v2] Add a diff driver for JavaScript languages xing zhi jiang
2022-03-13 21:54   ` Johannes Sixt
2022-04-03 13:17     ` xing zhi jiang
2022-03-14 17:20   ` Glen Choo
2022-03-15  7:40     ` Johannes Sixt
2022-03-15 18:51       ` Glen Choo
2022-03-15 19:22         ` Junio C Hamano
2022-03-15 21:34           ` Glen Choo
2022-04-03 13:24             ` xing zhi jiang
2022-04-03 13:20         ` xing zhi jiang
2022-04-03 13:21     ` xing zhi jiang
2022-04-03 13:25 ` [GSoC][PATCH v3] " xing zhi jiang
2022-04-03 14:40   ` Johannes Sixt
2022-04-04  7:12   ` Ævar Arnfjörð Bjarmason
2022-04-04 20:29     ` Johannes Sixt
2022-04-04 21:44       ` Junio C Hamano
2022-04-05  2:22       ` Ævar Arnfjörð Bjarmason
2022-04-04 17:32   ` Glen Choo

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).