From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 9F7AA1F8C6 for ; Fri, 9 Jul 2021 12:38:32 +0000 (UTC) Received: from localhost ([::1]:36588 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1pm3-0004fa-7a for normalperson@yhbt.net; Fri, 09 Jul 2021 08:38:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53982) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1pkq-0003Vj-Qp for bug-gnulib@gnu.org; Fri, 09 Jul 2021 08:37:16 -0400 Received: from air.basealt.ru ([194.107.17.39]:34106) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1pkn-00076p-R6 for bug-gnulib@gnu.org; Fri, 09 Jul 2021 08:37:16 -0400 Received: by air.basealt.ru (Postfix, from userid 490) id 27F0B58951D; Fri, 9 Jul 2021 12:37:10 +0000 (UTC) Received: from EGORI-MACHINE.malta.altlinux.ru (obninsk.basealt.ru [217.15.195.17]) by air.basealt.ru (Postfix) with ESMTPSA id 4A9FE589438; Fri, 9 Jul 2021 12:37:08 +0000 (UTC) From: Egor Ignatov To: ldv@altlinux.org Subject: [PATCH v2] regex: fix backreference matching Date: Fri, 9 Jul 2021 15:36:43 +0300 Message-Id: <20210709123643.60443-1-egori@altlinux.org> X-Mailer: git-send-email 2.29.3 In-Reply-To: <20210705121201.GA20072@altlinux.org> References: <20210705121201.GA20072@altlinux.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=194.107.17.39; envelope-from=egori@altlinux.org; helo=air.basealt.ru X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: eggert@cs.ucla.edu, bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" * lib/regexec.c (proceed_next_node): Disable dest_node check if we have backrefs (set_regs):Finish set_regs when we are at the last node and all regs have been set. (set_regs): Also shrink the match if we ready to finish but didn't accept the entire string matched by check_matching. Because check_matching may return a wrong match for regexp with back-references. For example check_matching regex '(a*)*(.)\1' and string 'ab' results in the match 'ab' where it should be just 'a' in the second capturing group. All built in tests as well as test from sed and grep have passed. Signed-off-by: Egor Ignatov --- lib/regexec.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/lib/regexec.c b/lib/regexec.c index 5e4eb497a..8f0f14575 100644 --- a/lib/regexec.c +++ b/lib/regexec.c @@ -1233,7 +1233,7 @@ proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t *regs, for (Idx i = 0; i < edests->nelem; i++) { Idx candidate = edests->elems[i]; - if (!re_node_set_contains (cur_nodes, candidate)) + if (!dfa->nbackref && !re_node_set_contains (cur_nodes, candidate)) continue; if (dest_node == -1) dest_node = candidate; @@ -1296,9 +1296,7 @@ proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t *regs, if (__glibc_unlikely (! ok)) return -2; dest_node = dfa->edests[node].elems[0]; - if (re_node_set_contains (&mctx->state_log[*pidx]->nodes, - dest_node)) - return dest_node; + return dest_node; } } @@ -1308,8 +1306,9 @@ proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t *regs, Idx dest_node = dfa->nexts[node]; *pidx = (naccepted == 0) ? *pidx + 1 : *pidx + naccepted; if (fs && (*pidx > mctx->match_last || mctx->state_log[*pidx] == NULL - || !re_node_set_contains (&mctx->state_log[*pidx]->nodes, - dest_node))) + || (!dfa->nbackref && + !re_node_set_contains (&mctx->state_log[*pidx]->nodes, + dest_node)))) return -1; re_node_set_empty (eps_via_nodes); return dest_node; @@ -1417,8 +1416,7 @@ set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, { update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); - if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) - || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) + if (cur_node == mctx->last_node) { Idx reg_idx; cur_node = -1; @@ -1434,6 +1432,7 @@ set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, } if (cur_node < 0) { + pmatch[0].rm_eo = idx; re_node_set_free (&eps_via_nodes); regmatch_list_free (&prev_match); return free_fail_stack_return (fs); -- 2.29.3