From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: * X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=1.1 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,LIST_MIRROR_RECEIVED,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 61AC51F670 for ; Wed, 2 Mar 2022 08:00:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239910AbiCBIBZ (ORCPT ); Wed, 2 Mar 2022 03:01:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239994AbiCBIBV (ORCPT ); Wed, 2 Mar 2022 03:01:21 -0500 Received: from bsmtp3.bon.at (bsmtp3.bon.at [213.33.87.17]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF1A4B6D22 for ; Wed, 2 Mar 2022 00:00:37 -0800 (PST) Received: from [192.168.0.98] (unknown [93.83.142.38]) by bsmtp3.bon.at (Postfix) with ESMTPSA id 4K7mkp3fw6z5tlD; Wed, 2 Mar 2022 09:00:34 +0100 (CET) Message-ID: <34a2ad39-604c-4edd-ea1c-de1212fc506b@kdbg.org> Date: Wed, 2 Mar 2022 09:00:34 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH] userdiff: add builtin driver for kotlin language Content-Language: en-US To: Jaydeep P Das References: <20220301070226.2477769-1-jaydeepjd.8914@gmail.com> <20220302064504.2651079-1-jaydeepjd.8914@gmail.com> <20220302064504.2651079-2-jaydeepjd.8914@gmail.com> Cc: git@vger.kernel.org, Junio C Hamano From: Johannes Sixt In-Reply-To: <20220302064504.2651079-2-jaydeepjd.8914@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Added jc to Cc:. Am 02.03.22 um 07:45 schrieb Jaydeep P Das: > diff --git a/t/t4034/kotlin/expect b/t/t4034/kotlin/expect > new file mode 100644 > index 0000000000..8acdc83bcc > --- /dev/null > +++ b/t/t4034/kotlin/expect > @@ -0,0 +1,35 @@ > +diff --git a/pre b/post > +index 884560d..7e136e2 100644 > +--- a/pre > ++++ b/post > +@@ -1,19 +1,19 @@ > +println("Hello World!\n?") > +(1) (-1e10) (0xabcdef) 'xy' > +100000100_000 This test does not demonstrates that numbers do not end at an '_', because if it did end there, the change would be from the single token 100000 to two tokens 100 and _000, and the mark-up would look exactly the same as we see here, and would remain undiagnosed. Instead, write the pre-image as 100_000 and the post image as 200_000. Then the correct mark-up would be 100_000200_000 and a bogus markup (that the test wants to diagnose) would look like 100200_000 > +[ax] ax->b ay x.by > +!a ax x.inv() ax*b ay x&by > +a+=-=b OK, so you decided to check operator += and -=. But what about all the other multi-character operators? > +ax*b ay x/b ay x%b > +ay > +x+b ay x-b > +ay > +x shl b ay x shr b > +ay > +x<b ay x<=b ay x>b ay x>=b > +ay > +x==b ay x!=b ay x===b > +ay > +x and b > +ay > +x^b > +ay > +x or b > +ay > +x&&b > +ay > +x||b > +ay > +x=b ay x+=b ay x-=b ay x*=b ay x/=b ay x%=b ay x<<=b ay x>>=b ay x&=b ay x^=b ay x|=b This line is the best candidate to check many multi-character operators. For example, the pre-image could read a=b c+=d e-=f g*=h i/=j k%=l m<<=n o>>=p q&=r s^=t u|=v and the post-image a+=b c=d e<=f g>=h i/j k%l m<>p q&r s^t u|v but there are more operators to check. Please either make these changes or drop this t4034 test case, because in its current form it gives a false sense of security, IMHO. > +ay > +x,y > +-ax+2 What do you want to demonstrate with this new test case? If you want to show that the + in +2 is not part of the number, then you must change, for example, "a+2" to "a+1". If you change only the a to x, then we do not know whether the +2 was regarded as one token or two. > diff --git a/userdiff.c b/userdiff.c > index 8578cb0d12..b92572b582 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -168,6 +168,14 @@ PATTERNS("java", > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > "|[-+*/<>%&^|=!]=" > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), > +PATTERNS("kotlin", > + "^[ \t]*(([a-z]+[ \t]+)*(fun|class|interface)[ \t]+.*)$", > + /* -- */ > + "[_]?[a-zA-Z][a-zA-Z0-9_]*" An underscore followed by a digit is not an identifier, but a number, right? Then this expression correctly does not match and the following expression dedicated to numbers takes care of it. Good. > + /*hexadecimal, integers and binary numbers*/ > + "|(0x0F|0b)?[0-9._]+([Ee][-+]?[0-9]+)?[fFlLuU]*" What is this "0x0F"? Did you mean just "0x"? And what about prefixes 0X and 0B? Are they not used as prefixes for hex and binary numbers? Moreover, I do not see how a hex number 0xff would be matched as a single token. > + /*match unary and binary operators*/ > + "|[-+*/<>%&^|=!]*"), Do not do this. There is an implicit single-character match that need not be written down in the regex. List all multi-character operators (but not the single-character operators) like you did in earlier rounds. As written, the "++!=" in an expression such as "a++!=b++" (which is not unlikely to be seen in real code) would be regarded as a single token. The verb "match" in the comment does not match the style of the other comments (drop the word), and please insert blanks between the comment delimiters and the text. > PATTERNS("markdown", > "^ {0,3}#{1,6}[ \t].*", > /* -- */ -- Hannes