git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4] userdiff: improve java hunk header regex
@ 2021-08-10 19:09 Tassilo Horn
  2021-08-10 20:57 ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Tassilo Horn @ 2021-08-10 19:09 UTC (permalink / raw)
  To: git; +Cc: Tassilo Horn

Currently, the git diff hunk headers show the wrong method signature if the
method has a qualified return type, an array return type, or a generic return
type because the regex doesn't allow dots (.), [], or < and > in the return
type.  Also, type parameter declarations couldn't be matched.

Add several t4018 tests asserting the right hunk headers for increasingly
complex method signatures:

  public String[] myMethod(String[] RIGHT)
  public List<String> myMethod(String[] RIGHT)
  public <T> List<T> myMethod(T[] RIGHT)
  public <AType, B> Map<AType, B> myMethod(String[] RIGHT)
  public <AType, B> java.util.Map<AType, Map<B, B[]>> myMethod(String[] RIGHT)
  public List<? extends Comparable> myMethod(String[] RIGHT)
  public <T extends Serializable & Comparable<T>> List<T> myMethod(String[] RIGHT)

Signed-off-by: Tassilo Horn <tsdh@gnu.org>
---
 t/t4018/java-constructor             |  6 ++++++
 t/t4018/java-enum-constant           |  6 ++++++
 t/t4018/java-nested-field            |  6 ++++++
 t/t4018/java-return-array            |  6 ++++++
 t/t4018/java-return-generic          |  6 ++++++
 t/t4018/java-return-generic-bounded  |  6 ++++++
 t/t4018/java-return-generic-wildcart |  6 ++++++
 t/t4018/java-return-generic2         |  6 ++++++
 t/t4018/java-return-generic3         |  6 ++++++
 t/t4018/java-return-generic4         |  6 ++++++
 userdiff.c                           | 23 ++++++++++++++++++++++-
 11 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 t/t4018/java-constructor
 create mode 100644 t/t4018/java-enum-constant
 create mode 100644 t/t4018/java-nested-field
 create mode 100644 t/t4018/java-return-array
 create mode 100644 t/t4018/java-return-generic
 create mode 100644 t/t4018/java-return-generic-bounded
 create mode 100644 t/t4018/java-return-generic-wildcart
 create mode 100644 t/t4018/java-return-generic2
 create mode 100644 t/t4018/java-return-generic3
 create mode 100644 t/t4018/java-return-generic4

diff --git a/t/t4018/java-constructor b/t/t4018/java-constructor
new file mode 100644
index 0000000000..9daf7c5430
--- /dev/null
+++ b/t/t4018/java-constructor
@@ -0,0 +1,6 @@
+public class MyClass {
+    MyClass(String RIGHT) {
+        // Whatever
+        // ChangeMe
+    }
+}
diff --git a/t/t4018/java-enum-constant b/t/t4018/java-enum-constant
new file mode 100644
index 0000000000..a1931c8379
--- /dev/null
+++ b/t/t4018/java-enum-constant
@@ -0,0 +1,6 @@
+private enum RIGHT {
+    ONE,
+    TWO,
+    THREE,
+    ChangeMe
+}
diff --git a/t/t4018/java-nested-field b/t/t4018/java-nested-field
new file mode 100644
index 0000000000..d92d3ec688
--- /dev/null
+++ b/t/t4018/java-nested-field
@@ -0,0 +1,6 @@
+class MyExample {
+    private static class RIGHT {
+        // change an inner class field
+        String inner = "ChangeMe";
+    }
+}
diff --git a/t/t4018/java-return-array b/t/t4018/java-return-array
new file mode 100644
index 0000000000..747638b9a8
--- /dev/null
+++ b/t/t4018/java-return-array
@@ -0,0 +1,6 @@
+class MyExample {
+    public String[] myMethod(String[] RIGHT) {
+        // Whatever...
+        return new; // ChangeMe
+    }
+}
diff --git a/t/t4018/java-return-generic b/t/t4018/java-return-generic
new file mode 100644
index 0000000000..161dd8338f
--- /dev/null
+++ b/t/t4018/java-return-generic
@@ -0,0 +1,6 @@
+class MyExample {
+    public List<String> myMethod(String[] RIGHT) {
+        // Whatever...
+        return Arrays.asList("ChangeMe");
+    }
+}
diff --git a/t/t4018/java-return-generic-bounded b/t/t4018/java-return-generic-bounded
new file mode 100644
index 0000000000..440115a788
--- /dev/null
+++ b/t/t4018/java-return-generic-bounded
@@ -0,0 +1,6 @@
+class MyExample {
+    public <T extends Serializable & Comparable<T>> List<T> myMethod(String[] RIGHT) {
+        // Whatever...
+        return (List<T>) Arrays.asList("ChangeMe");
+    }
+}
diff --git a/t/t4018/java-return-generic-wildcart b/t/t4018/java-return-generic-wildcart
new file mode 100644
index 0000000000..2d682e1e2b
--- /dev/null
+++ b/t/t4018/java-return-generic-wildcart
@@ -0,0 +1,6 @@
+class MyExample {
+    public List<? extends Comparable> myMethod(String[] RIGHT) {
+        // Whatever...
+        return Arrays.asList("ChangeMe");
+    }
+}
diff --git a/t/t4018/java-return-generic2 b/t/t4018/java-return-generic2
new file mode 100644
index 0000000000..7109c27456
--- /dev/null
+++ b/t/t4018/java-return-generic2
@@ -0,0 +1,6 @@
+class MyExample {
+    public <T> List<T> myMethod(T[] RIGHT) {
+        // Whatever...
+        return (List<T>) Arrays.asList("ChangeMe");
+    }
+}
diff --git a/t/t4018/java-return-generic3 b/t/t4018/java-return-generic3
new file mode 100644
index 0000000000..849f116f50
--- /dev/null
+++ b/t/t4018/java-return-generic3
@@ -0,0 +1,6 @@
+class MyExample {
+    public <AType, B> Map<AType, B> myMethod(String[] RIGHT) {
+        // Whatever...
+        return new java.util.HashMap<>(); // ChangeMe
+    }
+}
diff --git a/t/t4018/java-return-generic4 b/t/t4018/java-return-generic4
new file mode 100644
index 0000000000..1b22c8c037
--- /dev/null
+++ b/t/t4018/java-return-generic4
@@ -0,0 +1,6 @@
+class MyExample {
+    public <AType, B> java.util.Map<AType, Map<B, B[]>> myMethod(String[] RIGHT) {
+        // Whatever...
+        return new java.util.HashMap<>(); // ChangeMe
+    }
+}
diff --git a/userdiff.c b/userdiff.c
index 3c3bbe38b0..9bd751b7d2 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -142,7 +142,28 @@ PATTERNS("html",
 	 "[^<>= \t]+"),
 PATTERNS("java",
 	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
-	 "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
+         "^[ \t]*("
+         /* Class, enum, and interface declarations: */
+         /*   optional modifiers: public */
+         "(([a-z]+[ \t]+)*"
+         /*   the kind of declaration */
+         "(class|enum|interface)[ \t]+"
+         /*   the name */
+         "[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
+         /* Method & constructor signatures: */
+         /*   optional modifiers: public static */
+         "|(([a-z]+[ \t]+)*"
+         /*   type params and return types for methods but not constructors */
+         "("
+         /*     optional type parameters: <A, B extends Comparable<B>> */
+         "(<[A-Za-z0-9_,.&<> \t]+>[ \t]+)?"
+         /*     return type: java.util.Map<A, B[]> or List<?> */
+         "([A-Za-z_]([A-Za-z_0-9<>,.?]|\\[[ \t]*\\])*[ \t]+)+"
+         /*   end of type params and return type */
+         ")?"
+         /*   the method name followed by the parameter list: myMethod(...) */
+         "[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
+         ")$",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] userdiff: improve java hunk header regex
  2021-08-10 19:09 [PATCH v4] userdiff: improve java hunk header regex Tassilo Horn
@ 2021-08-10 20:57 ` Johannes Sixt
  2021-08-10 22:12   ` Re* " Junio C Hamano
  2021-08-11  5:22   ` Tassilo Horn
  0 siblings, 2 replies; 9+ messages in thread
From: Johannes Sixt @ 2021-08-10 20:57 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: git

Am 10.08.21 um 21:09 schrieb Tassilo Horn:
> Currently, the git diff hunk headers show the wrong method signature if the
> method has a qualified return type, an array return type, or a generic return
> type because the regex doesn't allow dots (.), [], or < and > in the return
> type.  Also, type parameter declarations couldn't be matched.
> 
> Add several t4018 tests asserting the right hunk headers for increasingly
> complex method signatures:
> 
>   public String[] myMethod(String[] RIGHT)
>   public List<String> myMethod(String[] RIGHT)
>   public <T> List<T> myMethod(T[] RIGHT)
>   public <AType, B> Map<AType, B> myMethod(String[] RIGHT)
>   public <AType, B> java.util.Map<AType, Map<B, B[]>> myMethod(String[] RIGHT)
>   public List<? extends Comparable> myMethod(String[] RIGHT)
>   public <T extends Serializable & Comparable<T>> List<T> myMethod(String[] RIGHT)
> 
> Signed-off-by: Tassilo Horn <tsdh@gnu.org>
> ---
>  t/t4018/java-constructor             |  6 ++++++
>  t/t4018/java-enum-constant           |  6 ++++++
>  t/t4018/java-nested-field            |  6 ++++++
>  t/t4018/java-return-array            |  6 ++++++
>  t/t4018/java-return-generic          |  6 ++++++
>  t/t4018/java-return-generic-bounded  |  6 ++++++
>  t/t4018/java-return-generic-wildcart |  6 ++++++
>  t/t4018/java-return-generic2         |  6 ++++++
>  t/t4018/java-return-generic3         |  6 ++++++
>  t/t4018/java-return-generic4         |  6 ++++++
>  userdiff.c                           | 23 ++++++++++++++++++++++-
>  11 files changed, 82 insertions(+), 1 deletion(-)
>  create mode 100644 t/t4018/java-constructor
>  create mode 100644 t/t4018/java-enum-constant
>  create mode 100644 t/t4018/java-nested-field
>  create mode 100644 t/t4018/java-return-array
>  create mode 100644 t/t4018/java-return-generic
>  create mode 100644 t/t4018/java-return-generic-bounded
>  create mode 100644 t/t4018/java-return-generic-wildcart
>  create mode 100644 t/t4018/java-return-generic2
>  create mode 100644 t/t4018/java-return-generic3
>  create mode 100644 t/t4018/java-return-generic4
> 

These new tests are very much appreciated. You do not have to go wild
with that many return type tests; IMO, the simple one and the most
complicated one should do it. (And btw, s/cart/card/)

> diff --git a/t/t4018/java-return-array b/t/t4018/java-return-array
> new file mode 100644
> index 0000000000..747638b9a8
> --- /dev/null
> +++ b/t/t4018/java-return-array
> @@ -0,0 +1,6 @@
> +class MyExample {
> +    public String[] myMethod(String[] RIGHT) {
> +        // Whatever...
> +        return new; // ChangeMe
> +    }
> +}
> diff --git a/userdiff.c b/userdiff.c
> index 3c3bbe38b0..9bd751b7d2 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -142,7 +142,28 @@ PATTERNS("html",
>  	 "[^<>= \t]+"),
>  PATTERNS("java",
>  	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
> -	 "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
> +         "^[ \t]*("
> +         /* Class, enum, and interface declarations: */
> +         /*   optional modifiers: public */
> +         "(([a-z]+[ \t]+)*"
> +         /*   the kind of declaration */
> +         "(class|enum|interface)[ \t]+"
> +         /*   the name */
> +         "[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
> +         /* Method & constructor signatures: */
> +         /*   optional modifiers: public static */
> +         "|(([a-z]+[ \t]+)*"
> +         /*   type params and return types for methods but not constructors */
> +         "("
> +         /*     optional type parameters: <A, B extends Comparable<B>> */
> +         "(<[A-Za-z0-9_,.&<> \t]+>[ \t]+)?"
> +         /*     return type: java.util.Map<A, B[]> or List<?> */
> +         "([A-Za-z_]([A-Za-z_0-9<>,.?]|\\[[ \t]*\\])*[ \t]+)+"
> +         /*   end of type params and return type */
> +         ")?"
> +         /*   the method name followed by the parameter list: myMethod(...) */
> +         "[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
> +         ")$",

I don't see the point in this complicated regex. Please recall that it
will be applied only to syntactically correct Java text. Therefore, you
do not have to implement all syntactical corner cases, just be
sufficiently permissive.

What is wrong with

	"^[ \t]*(([A-Za-z_][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[
\t]*\\([^;]*)$",

i.e. take every "token" until an identifier followed by an opening
parenthesis is found. Can types in Java contain parentheses? That would
make my suggested simplified regex too permissive, but otherwise it
would do its job, I would think.

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re* [PATCH v4] userdiff: improve java hunk header regex
  2021-08-10 20:57 ` Johannes Sixt
@ 2021-08-10 22:12   ` Junio C Hamano
  2021-08-11  7:14     ` Johannes Sixt
  2021-08-11  5:22   ` Tassilo Horn
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2021-08-10 22:12 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Tassilo Horn, git

Johannes Sixt <j6t@kdbg.org> writes:

> I don't see the point in this complicated regex. Please recall that it
> will be applied only to syntactically correct Java text. Therefore, you
> do not have to implement all syntactical corner cases, just be
> sufficiently permissive.

Good suggestion.  We may want to mention the above principle as a
comment near the top of the patterns array.

> What is wrong with
>
> 	"^[ \t]*(([A-Za-z_][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[
> \t]*\\([^;]*)$",
>
> i.e. take every "token" until an identifier followed by an opening
> parenthesis is found. Can types in Java contain parentheses? That would
> make my suggested simplified regex too permissive, but otherwise it
> would do its job, I would think.

Thanks.

---- >8 -------- >8 -------- >8 -------- >8 -------- >8 --------
Subject: userdiff: comment on the builtin patterns

Remind developers that they do not need to go overboard to implement
patterns to prepare for invalid constructs.  They only have to be
sufficiently permissive, assuming that the payload is syntactically
correct.

Text stolen mostly from Johannes Sixt.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 userdiff.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git c/userdiff.c w/userdiff.c
index d9b2ba752f..1a6d27fda6 100644
--- c/userdiff.c
+++ w/userdiff.c
@@ -13,6 +13,16 @@ static int drivers_alloc;
 #define IPATTERN(name, pattern, word_regex)			\
 	{ name, NULL, -1, { pattern, REG_EXTENDED | REG_ICASE }, \
 	  word_regex "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" }
+
+/*
+ * Built-in drivers for various languages, sorted by their names
+ * (except that the "default" is left at the end).
+ *
+ * When writing or updating patterns, assume that the contents these
+ * patterns are applied to are syntactically correct.  You do not have
+ * to implement all syntactical corner cases---the patterns have to be
+ * sufficiently permissive.
+ */
 static struct userdiff_driver builtin_drivers[] = {
 IPATTERN("ada",
 	 "!^(.*[ \t])?(is[ \t]+new|renames|is[ \t]+separate)([ \t].*)?$\n"

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] userdiff: improve java hunk header regex
  2021-08-10 20:57 ` Johannes Sixt
  2021-08-10 22:12   ` Re* " Junio C Hamano
@ 2021-08-11  5:22   ` Tassilo Horn
  2021-08-11  7:34     ` Johannes Sixt
  1 sibling, 1 reply; 9+ messages in thread
From: Tassilo Horn @ 2021-08-11  5:22 UTC (permalink / raw)
  To: Johannes Sixt, Junio C Hamano; +Cc: git

Johannes Sixt <j6t@kdbg.org> writes:

Hi Hannes & Junio,

> These new tests are very much appreciated. You do not have to go wild
> with that many return type tests; IMO, the simple one and the most
> complicated one should do it. (And btw, s/cart/card/)

Well, they appeared naturally as a result during development and made it
easier to spot errors when you know up to which level of complexity it
still worked.  Is there a stronger reason to remove tests which might
not be needed, e.g., runtime cost on some CI machines?

>> -	 "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
>> +         "^[ \t]*("
>> +         /* Class, enum, and interface declarations: */
>> +         /*   optional modifiers: public */
>> +         "(([a-z]+[ \t]+)*"
>> +         /*   the kind of declaration */
>> +         "(class|enum|interface)[ \t]+"
>> +         /*   the name */
>> +         "[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
>> +         /* Method & constructor signatures: */
>> +         /*   optional modifiers: public static */
>> +         "|(([a-z]+[ \t]+)*"
>> +         /*   type params and return types for methods but not constructors */
>> +         "("
>> +         /*     optional type parameters: <A, B extends Comparable<B>> */
>> +         "(<[A-Za-z0-9_,.&<> \t]+>[ \t]+)?"
>> +         /*     return type: java.util.Map<A, B[]> or List<?> */
>> +         "([A-Za-z_]([A-Za-z_0-9<>,.?]|\\[[ \t]*\\])*[ \t]+)+"
>> +         /*   end of type params and return type */
>> +         ")?"
>> +         /*   the method name followed by the parameter list: myMethod(...) */
>> +         "[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
>> +         ")$",
>
> I don't see the point in this complicated regex. Please recall that it
> will be applied only to syntactically correct Java text. Therefore,
> you do not have to implement all syntactical corner cases, just be
> sufficiently permissive.

I actually find it easier to understand if it is broken up into more
concrete alternatives and parts which are commented instaed of one
opaque "permissively match everything in one alternative" regex.  It
shows the intent of what you want to match.  But YMMV and since Junio
agrees with you, I'm fine with that approach.

> What is wrong with
>
> 	"^[ \t]*(([A-Za-z_][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[
> \t]*\\([^;]*)$",

That doesn't work for

  <T> List<T> foo()

or

  <T extends Foo & Bar> T foo()

so at least it needs to include &<> in the first group, too.

Also, it doesn't match class/enum/interface declarations anymore, so

  class Foo {
    String x = "ChangeMe";
  }

will have an empty hunk header.

Another thing I've noticed (with my suggested patch) is that I should
not try to match constructor signatures.  I think that's impossible
because they are indistinguishable from method calls, e.g., in

  public class MyClass {
      MyClass(String RIGHT) {
          someMethodCall();
          someOtherMethod(17)
              .doThat();
          // Whatever
          // ChangeMe
      }
  }

there is no regex way to prefer MyClass(String RIGHT) over
someOtherMethod().

So all in all, I'd propose this version in the next patch version:

--8<---------------cut here---------------start------------->8---
PATTERNS("java",
	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
         "^[ \t]*("
         /* Class, enum, and interface declarations */
         "(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
         /* Method definitions; note that constructor signatures are not */
         /* matched because they are indistinguishable from method calls. */
         "|(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
         ")$",
	 /* -- */
	 "[a-zA-Z_][a-zA-Z0-9_]*"
	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
	 "|[-+*/<>%&^|=!]="
	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
--8<---------------cut here---------------end--------------->8---

That works for all my test cases (which I have also altered to include
the method calls from above before the ChangeMe) except for
java-constructor where it shows

  public class MyClass {

instead of

      MyClass(String RIGHT) {

in the hunk header which is expected as explained earlier and in the
comment.

Does that seem like a good middle ground?

Bye,
Tassilo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re* [PATCH v4] userdiff: improve java hunk header regex
  2021-08-10 22:12   ` Re* " Junio C Hamano
@ 2021-08-11  7:14     ` Johannes Sixt
  2021-08-11 16:04       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2021-08-11  7:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tassilo Horn, git

Am 11.08.21 um 00:12 schrieb Junio C Hamano:
> Subject: userdiff: comment on the builtin patterns
> 
> Remind developers that they do not need to go overboard to implement
> patterns to prepare for invalid constructs.  They only have to be
> sufficiently permissive, assuming that the payload is syntactically
> correct.
> 
> Text stolen mostly from Johannes Sixt.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  userdiff.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git c/userdiff.c w/userdiff.c
> index d9b2ba752f..1a6d27fda6 100644
> --- c/userdiff.c
> +++ w/userdiff.c
> @@ -13,6 +13,16 @@ static int drivers_alloc;
>  #define IPATTERN(name, pattern, word_regex)			\
>  	{ name, NULL, -1, { pattern, REG_EXTENDED | REG_ICASE }, \
>  	  word_regex "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" }
> +
> +/*
> + * Built-in drivers for various languages, sorted by their names
> + * (except that the "default" is left at the end).
> + *
> + * When writing or updating patterns, assume that the contents these
> + * patterns are applied to are syntactically correct.  You do not have
> + * to implement all syntactical corner cases---the patterns have to be
> + * sufficiently permissive.
> + */

IMO, as written, the comment falls short of suggesting that patterns can
be simple. How about appending "and can be simple"?

>  static struct userdiff_driver builtin_drivers[] = {
>  IPATTERN("ada",
>  	 "!^(.*[ \t])?(is[ \t]+new|renames|is[ \t]+separate)([ \t].*)?$\n"
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] userdiff: improve java hunk header regex
  2021-08-11  5:22   ` Tassilo Horn
@ 2021-08-11  7:34     ` Johannes Sixt
  2021-08-11  7:39       ` Tassilo Horn
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2021-08-11  7:34 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: git, Junio C Hamano

Am 11.08.21 um 07:22 schrieb Tassilo Horn:
> Johannes Sixt <j6t@kdbg.org> writes:
>> These new tests are very much appreciated. You do not have to go wild
>> with that many return type tests; IMO, the simple one and the most
>> complicated one should do it. (And btw, s/cart/card/)
> 
> Well, they appeared naturally as a result during development and made it
> easier to spot errors when you know up to which level of complexity it
> still worked.  Is there a stronger reason to remove tests which might
> not be needed, e.g., runtime cost on some CI machines?

I totally understand how the test cases evolved. Having many of them is
not a big deal. It's just the disproportion of tests of this new feature
vs. the existing tests that your patch creates, in particular, when
earlier of the new tests are subsumed by later new tests.

> Another thing I've noticed (with my suggested patch) is that I should
> not try to match constructor signatures.  I think that's impossible
> because they are indistinguishable from method calls, e.g., in
> 
>   public class MyClass {
>       MyClass(String RIGHT) {
>           someMethodCall();
>           someOtherMethod(17)
>               .doThat();
>           // Whatever
>           // ChangeMe
>       }
>   }
> 
> there is no regex way to prefer MyClass(String RIGHT) over
> someOtherMethod().

Good find.

> So all in all, I'd propose this version in the next patch version:
> 
> --8<---------------cut here---------------start------------->8---
> PATTERNS("java",
> 	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
>          "^[ \t]*("
>          /* Class, enum, and interface declarations */
>          "(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
>          /* Method definitions; note that constructor signatures are not */
>          /* matched because they are indistinguishable from method calls. */
>          "|(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
>          ")$",
> 	 /* -- */
> 	 "[a-zA-Z_][a-zA-Z0-9_]*"
> 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
> 	 "|[-+*/<>%&^|=!]="
> 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> --8<---------------cut here---------------end--------------->8---

That looks fine.

One suggestion, though. You do not have to have all positive patterns
("class, enum, interface" and "method definitions") in a single pattern
separated by "|". You can place them on different "lines" (note the "\n"
at the end of the first pattern):

	/* Class, enum, and interface declarations */
	"^[ \t]*(...(class|enum|interface)...)$\n"
	/*
	 * Method definitions; note that constructor signatures are not
	 * matched because they are indistinguishable from method calls.
	 */
	"^[ \t]*(...[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*))$",

I don't think there is a technical difference, but I find this form
easier to understand because fewer open parentheses have to be tracked.

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] userdiff: improve java hunk header regex
  2021-08-11  7:34     ` Johannes Sixt
@ 2021-08-11  7:39       ` Tassilo Horn
  0 siblings, 0 replies; 9+ messages in thread
From: Tassilo Horn @ 2021-08-11  7:39 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano

Johannes Sixt <j6t@kdbg.org> writes:

Hi Hannes,

>>> These new tests are very much appreciated. You do not have to go
>>> wild with that many return type tests; IMO, the simple one and the
>>> most complicated one should do it. (And btw, s/cart/card/)
>> 
>> Well, they appeared naturally as a result during development and made
>> it easier to spot errors when you know up to which level of
>> complexity it still worked.  Is there a stronger reason to remove
>> tests which might not be needed, e.g., runtime cost on some CI
>> machines?
>
> I totally understand how the test cases evolved. Having many of them
> is not a big deal. It's just the disproportion of tests of this new
> feature vs. the existing tests that your patch creates, in particular,
> when earlier of the new tests are subsumed by later new tests.

Sure thing, I'll see if I can remove some tests.

>> Another thing I've noticed (with my suggested patch) is that I should
>> not try to match constructor signatures.  I think that's impossible
>> because they are indistinguishable from method calls, e.g., in
>> 
>>   public class MyClass {
>>       MyClass(String RIGHT) {
>>           someMethodCall();
>>           someOtherMethod(17)
>>               .doThat();
>>           // Whatever
>>           // ChangeMe
>>       }
>>   }
>> 
>> there is no regex way to prefer MyClass(String RIGHT) over
>> someOtherMethod().
>
> Good find.

The longer you play with it, the more you find out.

>> So all in all, I'd propose this version in the next patch version:
>> 
>> --8<---------------cut here---------------start------------->8---
>> PATTERNS("java",
>> 	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
>>          "^[ \t]*("
>>          /* Class, enum, and interface declarations */
>>          "(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
>>          /* Method definitions; note that constructor signatures are not */
>>          /* matched because they are indistinguishable from method calls. */
>>          "|(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
>>          ")$",
>> 	 /* -- */
>> 	 "[a-zA-Z_][a-zA-Z0-9_]*"
>> 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>> 	 "|[-+*/<>%&^|=!]="
>> 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
>> --8<---------------cut here---------------end--------------->8---
>
> That looks fine.
>
> One suggestion, though. You do not have to have all positive patterns
> ("class, enum, interface" and "method definitions") in a single
> pattern separated by "|". You can place them on different "lines"
> (note the "\n" at the end of the first pattern):
>
> 	/* Class, enum, and interface declarations */
> 	"^[ \t]*(...(class|enum|interface)...)$\n"
> 	/*
> 	 * Method definitions; note that constructor signatures are not
> 	 * matched because they are indistinguishable from method calls.
> 	 */
> 	"^[ \t]*(...[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*))$",
>
> I don't think there is a technical difference, but I find this form
> easier to understand because fewer open parentheses have to be
> tracked.

Yes, indeed.  Because of that reason I've put the first ( and the last )
on separate lines but your approach is even better.

Patch version v5 will come anytime soon.

Thanks!
Tassilo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re* [PATCH v4] userdiff: improve java hunk header regex
  2021-08-11  7:14     ` Johannes Sixt
@ 2021-08-11 16:04       ` Junio C Hamano
  2021-08-11 20:32         ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2021-08-11 16:04 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Tassilo Horn, git

Johannes Sixt <j6t@kdbg.org> writes:

>> + * When writing or updating patterns, assume that the contents these
>> + * patterns are applied to are syntactically correct.  You do not have
>> + * to implement all syntactical corner cases---the patterns have to be
>> + * sufficiently permissive.
>> + */
>
> IMO, as written, the comment falls short of suggesting that patterns can
> be simple. How about appending "and can be simple"?

    The patterns can be simple without implementing all syntactical
    corner cases, as long as they are sufficiently permissive.

perhaps?

Thanks.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re* [PATCH v4] userdiff: improve java hunk header regex
  2021-08-11 16:04       ` Junio C Hamano
@ 2021-08-11 20:32         ` Johannes Sixt
  0 siblings, 0 replies; 9+ messages in thread
From: Johannes Sixt @ 2021-08-11 20:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tassilo Horn, git

Am 11.08.21 um 18:04 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>>> + * When writing or updating patterns, assume that the contents these
>>> + * patterns are applied to are syntactically correct.  You do not have
>>> + * to implement all syntactical corner cases---the patterns have to be
>>> + * sufficiently permissive.
>>> + */
>>
>> IMO, as written, the comment falls short of suggesting that patterns can
>> be simple. How about appending "and can be simple"?
> 
>     The patterns can be simple without implementing all syntactical
>     corner cases, as long as they are sufficiently permissive.
> 
> perhaps?

Perfect! Thank you.

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-08-11 20:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10 19:09 [PATCH v4] userdiff: improve java hunk header regex Tassilo Horn
2021-08-10 20:57 ` Johannes Sixt
2021-08-10 22:12   ` Re* " Junio C Hamano
2021-08-11  7:14     ` Johannes Sixt
2021-08-11 16:04       ` Junio C Hamano
2021-08-11 20:32         ` Johannes Sixt
2021-08-11  5:22   ` Tassilo Horn
2021-08-11  7:34     ` Johannes Sixt
2021-08-11  7:39       ` Tassilo Horn

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).