From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.6 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 818521F5AE for ; Wed, 16 Jun 2021 09:47:17 +0000 (UTC) Received: from localhost ([::1]:48468 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ltS8i-0005zc-0Y for normalperson@yhbt.net; Wed, 16 Jun 2021 05:47:16 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47990) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ltS8e-0005xO-9i for bug-gnulib@gnu.org; Wed, 16 Jun 2021 05:47:12 -0400 Received: from air.basealt.ru ([194.107.17.39]:47886) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ltS8c-0007uo-1R for bug-gnulib@gnu.org; Wed, 16 Jun 2021 05:47:11 -0400 Received: by air.basealt.ru (Postfix, from userid 490) id 96033589508; Wed, 16 Jun 2021 09:47:05 +0000 (UTC) Received: from EGORI-MACHINE.malta.altlinux.ru (obninsk.basealt.ru [217.15.195.17]) by air.basealt.ru (Postfix) with ESMTPSA id 208BE589425; Wed, 16 Jun 2021 09:46:52 +0000 (UTC) From: Egor Ignatov To: ldv@altlinux.org, eggert@cs.ucla.edu Subject: [PATCH] regex: fix backreference matching Date: Wed, 16 Jun 2021 12:46:15 +0300 Message-Id: <20210616094615.186681-1-egori@altlinux.org> X-Mailer: git-send-email 2.29.3 In-Reply-To: <20210607011027.GA18724@altlinux.org> References: <20210607011027.GA18724@altlinux.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=194.107.17.39; envelope-from=egori@altlinux.org; helo=air.basealt.ru X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" This fixes a bug described in 70b673eb7. * lib/regexec.c (set_regs): Revert pop condition changed in the commit mentioned above. (proceed_next_node): Always proceed on OP_BACK_REF to the next node if naccepted is 0. (update_regs): Fix optional sub expression boundaries matching. * tests/test-regex.c: Fix tests. Signed-off-by: Egor Ignatov --- lib/regexec.c | 12 ++++++------ tests/test-regex.c | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/lib/regexec.c b/lib/regexec.c index 5d4113c9d..23b984a21 100644 --- a/lib/regexec.c +++ b/lib/regexec.c @@ -1292,9 +1292,9 @@ proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t *regs, if (__glibc_unlikely (! ok)) return -2; dest_node = dfa->edests[node].elems[0]; - if (re_node_set_contains (&mctx->state_log[*pidx]->nodes, - dest_node)) - return dest_node; + if(dfa->nodes[dest_node].type == END_OF_RE) + regs[0].rm_eo = *pidx; + return dest_node; } } @@ -1413,8 +1413,7 @@ set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, { update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); - if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) - || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) + if (idx == pmatch[0].rm_eo && cur_node == mctx->last_node) { Idx reg_idx; cur_node = -1; @@ -1514,7 +1513,8 @@ update_regs (const re_dfa_t *dfa, regmatch_t *pmatch, else { if (dfa->nodes[cur_node].opt_subexp - && prev_idx_match[reg_num].rm_so != -1) + && prev_idx_match[reg_num].rm_so != -1 + && pmatch[reg_num].rm_eo != -1) /* We transited through an empty match for an optional subexpression, like (a?)*, and this is not the subexp's first match. Copy back the old content of the registers diff --git a/tests/test-regex.c b/tests/test-regex.c index 7ea73cfb6..f73909258 100644 --- a/tests/test-regex.c +++ b/tests/test-regex.c @@ -119,7 +119,7 @@ static struct /* Test for *+ match. */ { "^a*+(.)", "ab", REG_EXTENDED, 2, { { 0, 2 }, { 1, 2 } } }, /* Test for ** match. */ - { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 0, 1 }, { 1, 2 } } }, + { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 1, 1 }, { 1, 2 } } }, }; static void @@ -431,7 +431,7 @@ main (void) else if (! (regs.start[0] == 0 && regs.end[0] == 1)) report_error ("re_search '%s' on '%s' returned wrong match [%d,%d)", pat_sub2, data, (int) regs.start[0], (int) regs.end[0]); - else if (! (regs.start[1] == 0 && regs.end[1] == 0)) + else if (! (regs.start[1] == 1 && regs.end[1] == 1)) report_error ("re_search '%s' on '%s' returned wrong submatch [%d,%d)", pat_sub2, data, regs.start[1], regs.end[1]); regfree (®ex); -- 2.29.3