git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Tassilo Horn <tsdh@gnu.org>
To: Johannes Sixt <j6t@kdbg.org>, Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v4] userdiff: improve java hunk header regex
Date: Wed, 11 Aug 2021 07:22:06 +0200	[thread overview]
Message-ID: <87zgtoh6bm.fsf@gnu.org> (raw)
In-Reply-To: <d3484278-8413-0d10-e6cd-59a7ff04564b@kdbg.org>

Johannes Sixt <j6t@kdbg.org> writes:

Hi Hannes & Junio,

> These new tests are very much appreciated. You do not have to go wild
> with that many return type tests; IMO, the simple one and the most
> complicated one should do it. (And btw, s/cart/card/)

Well, they appeared naturally as a result during development and made it
easier to spot errors when you know up to which level of complexity it
still worked.  Is there a stronger reason to remove tests which might
not be needed, e.g., runtime cost on some CI machines?

>> -	 "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
>> +         "^[ \t]*("
>> +         /* Class, enum, and interface declarations: */
>> +         /*   optional modifiers: public */
>> +         "(([a-z]+[ \t]+)*"
>> +         /*   the kind of declaration */
>> +         "(class|enum|interface)[ \t]+"
>> +         /*   the name */
>> +         "[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
>> +         /* Method & constructor signatures: */
>> +         /*   optional modifiers: public static */
>> +         "|(([a-z]+[ \t]+)*"
>> +         /*   type params and return types for methods but not constructors */
>> +         "("
>> +         /*     optional type parameters: <A, B extends Comparable<B>> */
>> +         "(<[A-Za-z0-9_,.&<> \t]+>[ \t]+)?"
>> +         /*     return type: java.util.Map<A, B[]> or List<?> */
>> +         "([A-Za-z_]([A-Za-z_0-9<>,.?]|\\[[ \t]*\\])*[ \t]+)+"
>> +         /*   end of type params and return type */
>> +         ")?"
>> +         /*   the method name followed by the parameter list: myMethod(...) */
>> +         "[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
>> +         ")$",
>
> I don't see the point in this complicated regex. Please recall that it
> will be applied only to syntactically correct Java text. Therefore,
> you do not have to implement all syntactical corner cases, just be
> sufficiently permissive.

I actually find it easier to understand if it is broken up into more
concrete alternatives and parts which are commented instaed of one
opaque "permissively match everything in one alternative" regex.  It
shows the intent of what you want to match.  But YMMV and since Junio
agrees with you, I'm fine with that approach.

> What is wrong with
>
> 	"^[ \t]*(([A-Za-z_][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[
> \t]*\\([^;]*)$",

That doesn't work for

  <T> List<T> foo()

or

  <T extends Foo & Bar> T foo()

so at least it needs to include &<> in the first group, too.

Also, it doesn't match class/enum/interface declarations anymore, so

  class Foo {
    String x = "ChangeMe";
  }

will have an empty hunk header.

Another thing I've noticed (with my suggested patch) is that I should
not try to match constructor signatures.  I think that's impossible
because they are indistinguishable from method calls, e.g., in

  public class MyClass {
      MyClass(String RIGHT) {
          someMethodCall();
          someOtherMethod(17)
              .doThat();
          // Whatever
          // ChangeMe
      }
  }

there is no regex way to prefer MyClass(String RIGHT) over
someOtherMethod().

So all in all, I'd propose this version in the next patch version:

--8<---------------cut here---------------start------------->8---
PATTERNS("java",
	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
         "^[ \t]*("
         /* Class, enum, and interface declarations */
         "(([a-z]+[ \t]+)*(class|enum|interface)[ \t]+[A-Za-z][A-Za-z0-9_$]*[ \t]+.*)"
         /* Method definitions; note that constructor signatures are not */
         /* matched because they are indistinguishable from method calls. */
         "|(([A-Za-z_<>&][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)"
         ")$",
	 /* -- */
	 "[a-zA-Z_][a-zA-Z0-9_]*"
	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
	 "|[-+*/<>%&^|=!]="
	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
--8<---------------cut here---------------end--------------->8---

That works for all my test cases (which I have also altered to include
the method calls from above before the ChangeMe) except for
java-constructor where it shows

  public class MyClass {

instead of

      MyClass(String RIGHT) {

in the hunk header which is expected as explained earlier and in the
comment.

Does that seem like a good middle ground?

Bye,
Tassilo

  parent reply	other threads:[~2021-08-11  5:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10 19:09 [PATCH v4] userdiff: improve java hunk header regex Tassilo Horn
2021-08-10 20:57 ` Johannes Sixt
2021-08-10 22:12   ` Re* " Junio C Hamano
2021-08-11  7:14     ` Johannes Sixt
2021-08-11 16:04       ` Junio C Hamano
2021-08-11 20:32         ` Johannes Sixt
2021-08-11  5:22   ` Tassilo Horn [this message]
2021-08-11  7:34     ` Johannes Sixt
2021-08-11  7:39       ` Tassilo Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgtoh6bm.fsf@gnu.org \
    --to=tsdh@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).