From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=0.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_BL_SPAMCOP_NET,SPF_HELO_PASS, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [94.130.110.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id CAB091F44D for ; Fri, 15 Mar 2024 00:37:24 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; secure) header.d=ml.ruby-lang.org header.i=@ml.ruby-lang.org header.a=rsa-sha256 header.s=mail header.b=oyPp5HzL; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=FM736rcg; dkim-atps=neutral Received: from nue.mailmanlists.eu (localhost [127.0.0.1]) by nue.mailmanlists.eu (Postfix) with ESMTP id 2020C832B4; Fri, 15 Mar 2024 00:37:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ml.ruby-lang.org; s=mail; t=1710463037; bh=1SCmK1WS1uVg9hdOHiCg+ZxLCJj9d+8x0rW8cOvg7Zk=; h=Date:References:To:Reply-To:Subject:List-Id:List-Archive: List-Help:List-Owner:List-Post:List-Subscribe:List-Unsubscribe: From:Cc:From; b=oyPp5HzLj2g6gQoPx3v3+3HoBmBT5bqgmVNQxbA0wbEa6W9kcQJmC9nudac87ss4a HZkO/wdOSwDePsV1ASm4DGj8CK3o/+tKAFQfE0GFdBprbt+9t23i4vcf1vZkcW/fVS RdNLThj9kaSNdYvp2MxQNyr38ql7GlWpU22mu2Qw= Received: from s.wrqvtbkv.outbound-mail.sendgrid.net (s.wrqvtbkv.outbound-mail.sendgrid.net [149.72.123.24]) by nue.mailmanlists.eu (Postfix) with ESMTPS id 021C4832AD for ; Fri, 15 Mar 2024 00:37:12 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=pass (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=FM736rcg; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:from:subject:to; s=s1; bh=VGKCCH8wRp0K+SkQRDPc8jb45VFqY1xdnCP17eLulfQ=; b=FM736rcguMEXUEyLJ8J/4F3tysfMpCE/Gu0bhxmy8Vyz5zrowOGchwEeawPtLaFYZp8a 8LQYgIlbqbixPsOvlV0qop/4ND+IKw5SK+hMhNt+CajJk8+mAlXKOcjUy/ukj3sA2zT2XO v3btRf/RkQmvTgC3sjXWDaFAoWyKS9AuJKX+8rHMHd37jtF3+cUaQCNFqEP3c1WgM3gBtb qihgSW4t8DewCzQ530+cRuQ2o3PDCq1rN9If4JDY/W9IeM+DdOyNOkgloNHE/yxQMxk07d HViYLWBWnJyWocECnGBKeA6KdqxoasLoCYR+aK6bnpT2rbRY8OeEv5iyt1dst1SQ== Received: by recvd-6449d6bd6c-l985k with SMTP id recvd-6449d6bd6c-l985k-1-65F39837-13 2024-03-15 00:37:11.760055852 +0000 UTC m=+1306039.946310398 Received: from herokuapp.com (unknown) by geopod-ismtpd-33 (SG) with ESMTP id 5Hsk78z6Qd-gtHGw6C_CJw for ; Fri, 15 Mar 2024 00:37:11.749 +0000 (UTC) Date: Fri, 15 Mar 2024 00:37:11 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Bug X-Redmine-Issue-Id: 20225 X-Redmine-Issue-Author: make_now_just X-Redmine-Issue-Assignee: make_now_just X-Redmine-Issue-Priority: Normal X-Redmine-Sender: make_now_just X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 93757 X-SG-EID: =?us-ascii?Q?u001=2EI=2F7w=2Ft1jrEgh242qngSP8jg9nDSTEt2BRMlkMECajBzPSBZ2Pt+ndvcFf?= =?us-ascii?Q?o8iC48pRGqKOZ3zQLnaOew7snR7nVXVSCfWWZOX?= =?us-ascii?Q?=2FfWnpy95HqMu2QBoQFr+KnPz1v504sTfu5HGJdf?= =?us-ascii?Q?ZjDRXuF9XdRrUJ+HzKGxZDr9QEMAGQMxEN0x72Z?= =?us-ascii?Q?I7NxUOY9d1yglpXyYe0UapP6qfPQNKCYoHLqeiY?= =?us-ascii?Q?x+SoHRVZSarcgnbHtKT5JjjQVNgPlIJ6x3SJYxs?= =?us-ascii?Q?fp2fs+VCm7nEcDPDdwUE1arnng=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: u001.I8uzylDtAfgbeCOeLBYDww== Message-ID-Hash: IARUHSGCPA3ABGP6NO6TKKYKAAYY5PYC X-Message-ID-Hash: IARUHSGCPA3ABGP6NO6TKKYKAAYY5PYC X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:117191] [Ruby master Bug#20225] Inconsistent behavior of regex matching for a regex has a null loop List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "make_now_just (Hiroya Fujinami) via ruby-core" Cc: "make_now_just (Hiroya Fujinami)" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Issue #20225 has been updated by make_now_just (Hiroya Fujinami). On dev meeting, matz concluded that, but I also think a null-loop bug must be fixed. However, there are still remaining issues: - I'm not sure "what the correct behavior of capture-aware null-loop detection". (I also am not sure TruffleRegex's behavior is correct.) - Additionally, I'm not sure it is possible to implement such correct behavior to Onigmo in an efficient way. - I also wonder if it can work with memoization. This issue is complex. ---------------------------------------- Bug #20225: Inconsistent behavior of regex matching for a regex has a null loop https://bugs.ruby-lang.org/issues/20225#change-107280 * Author: make_now_just (Hiroya Fujinami) * Status: Open * Assignee: make_now_just (Hiroya Fujinami) * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- Usually, in Ruby (Onigmo), when a null loop (a loop consuming no characters) occurs on regex matching, this loop is terminated. But, if a loop has a capture and some complex condition is satisfied, this causes backtracking. This behavior invokes unexpected results, for example, ```ruby p /(?:.B.(?(?:[C-Z]|.)*)+){2}/ =~ "ABCABC" # => nil p /(?:.B.(?:(?:[C-Z]|.)*)+){2}/ =~ "ABCABC" # => 0 ``` Because the above regex has a capture and the below does not, different matching results are returned. It is not very intuitive that the presence of a capture changes the matching result. The detailed condition for changing the null-loop behavior is 1) a previous capture in this loop holds the empty string, and 2) this capture's position is different from the current matching position. This condition is checked in `STACK_NULL_CHECK_MEMST` (https://github.com/ruby/ruby/blob/bbb7ab906ec64b963bd4b5d37e47b14796d64371/regexec.c#L1766-L1778). Perhaps, you cannot understand what this condition means. Don't worry, I also cannot understand. This condition has been introduced for at least 20 years, and no one may remember the reason for this necessity. (If you know, please tell me!) Even if there is a reason, I believe that there is no reasonable authority for allowing counter-intuitive behavior, such as the above example. This behavior can also cause memoization to be buggy. Memoization relies on the fact that backtracking only depends on positions and states (byte-code offsets of a regex). However, this condition additionally refers to captures, and the memoization is broken. My proposal is to **correct this inconsistent behavior**. Specifically, a null loop should be determined solely on the basis of whether the matching position has changed, without referring to captures. This fix changes the behavior of regex matching, but I believe that the probability that this will actually cause backward compatibility problems is remarkably low. This is because I have never seen any mention of this puzzling behavior before. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/