bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* [PATCH] regex: fix match with possessive quantifier
@ 2021-05-26  9:08 Egor Ignatov
  2021-06-06 21:45 ` Dmitry V. Levin
  0 siblings, 1 reply; 13+ messages in thread
From: Egor Ignatov @ 2021-05-26  9:08 UTC (permalink / raw)
  To: eggert; +Cc: bug-gnulib

Fix behaviour introduced in 70b673e, where regexps with
possessive quantifier("*+") didn't match.
* lib/regexec.c
(set_regs): Pop if CUR_NODE has already been checked only when
we have a fail stack.

Signed-off-by: Egor Ignatov <egori@altlinux.org>
---
Hi Paul,

Do you have any test cases for bug 11053(glibc) for gnulib?
This patch fixes the issue with "*+", but I'm not sure it
doesn't break your fix for 11053.

Best regards,
Egor

 lib/regexec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/regexec.c b/lib/regexec.c
index 6309deac8..5d4113c9d 100644
--- a/lib/regexec.c
+++ b/lib/regexec.c
@@ -1414,7 +1414,7 @@ set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
       update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
 
       if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
-	  || re_node_set_contains (&eps_via_nodes, cur_node))
+	  || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
 	{
 	  Idx reg_idx;
 	  cur_node = -1;
-- 
2.29.3



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] regex: fix match with possessive quantifier
  2021-05-26  9:08 [PATCH] regex: fix match with possessive quantifier Egor Ignatov
@ 2021-06-06 21:45 ` Dmitry V. Levin
  2021-06-07  1:10   ` Dmitry V. Levin
  2021-06-21 21:09   ` Paul Eggert
  0 siblings, 2 replies; 13+ messages in thread
From: Dmitry V. Levin @ 2021-06-06 21:45 UTC (permalink / raw)
  To: Egor Ignatov; +Cc: Paul Eggert, bug-gnulib

On Wed, May 26, 2021 at 12:08:19PM +0300, Egor Ignatov wrote:
> Fix behaviour introduced in 70b673e, where regexps with
> possessive quantifier("*+") didn't match.
> * lib/regexec.c
> (set_regs): Pop if CUR_NODE has already been checked only when
> we have a fail stack.
> 
> Signed-off-by: Egor Ignatov <egori@altlinux.org>
> ---
> Hi Paul,
> 
> Do you have any test cases for bug 11053(glibc) for gnulib?
> This patch fixes the issue with "*+", but I'm not sure it
> doesn't break your fix for 11053.

Thanks, the fix looks plausible, it doesn't break any tests
(including those introduced along with commit 70b673eb7),
so I've applied it now, and the following follow-up patch:

diff --git a/tests/test-regex.c b/tests/test-regex.c
index 3de6213ff..7ea73cfb6 100644
--- a/tests/test-regex.c
+++ b/tests/test-regex.c
@@ -116,6 +116,10 @@ static struct
     "level", REG_NOSUB | REG_EXTENDED, 0, { { -1, -1 } } },
   { "^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?\\9\\8\\7\\6\\5\\4\\3\\2\\1$",
     "ababababa", REG_EXTENDED, 1, { { 0, 9 } } },
+  /* Test for *+ match.  */
+  { "^a*+(.)", "ab", REG_EXTENDED, 2, { { 0, 2 }, { 1, 2 } } },
+  /* Test for ** match.  */
+  { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 0, 1 }, { 1, 2 } } },
 };
 
 static void

-- 
ldv


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] regex: fix match with possessive quantifier
  2021-06-06 21:45 ` Dmitry V. Levin
@ 2021-06-07  1:10   ` Dmitry V. Levin
  2021-06-16  9:46     ` [PATCH] regex: fix backreference matching Egor Ignatov
  2021-06-16 10:18     ` [PATCH] regex: fix match with possessive quantifier Dmitry V. Levin
  2021-06-21 21:09   ` Paul Eggert
  1 sibling, 2 replies; 13+ messages in thread
From: Dmitry V. Levin @ 2021-06-07  1:10 UTC (permalink / raw)
  To: Egor Ignatov, Paul Eggert; +Cc: bug-gnulib

On Mon, Jun 07, 2021 at 12:45:02AM +0300, Dmitry V. Levin wrote:
> On Wed, May 26, 2021 at 12:08:19PM +0300, Egor Ignatov wrote:
> > Fix behaviour introduced in 70b673e, where regexps with
> > possessive quantifier("*+") didn't match.
> > * lib/regexec.c
> > (set_regs): Pop if CUR_NODE has already been checked only when
> > we have a fail stack.
> > 
> > Signed-off-by: Egor Ignatov <egori@altlinux.org>
> > ---
> > Hi Paul,
> > 
> > Do you have any test cases for bug 11053(glibc) for gnulib?
> > This patch fixes the issue with "*+", but I'm not sure it
> > doesn't break your fix for 11053.
> 
> Thanks, the fix looks plausible, it doesn't break any tests
> (including those introduced along with commit 70b673eb7),

Apparently, there are more issues with commit 70b673eb7, for example:

$ echo ab | sed -E 's/^(a*)*(.)\1/\1/'
Segmentation fault

$ echo ab | strace -enone -- sed --debug -E 's/^(a*)*(.)\1/\1/'
SED PROGRAM:
  s/^(a*)*(.)\\1/\1/
INPUT:   'STDIN' line 1
PATTERN: ab
COMMAND: s/^(a*)*(.)\\1/\1/
MATCHED REGEX REGISTERS
  regex[0] = 0-2 'ab'
  regex[1] = 0--1 'ab!!ab
'
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x20} ---
+++ killed by SIGSEGV +++
Segmentation fault


-- 
ldv


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] regex: fix backreference matching
  2021-06-07  1:10   ` Dmitry V. Levin
@ 2021-06-16  9:46     ` Egor Ignatov
  2021-06-16 10:13       ` Dmitry V. Levin
  2021-06-16 10:18     ` [PATCH] regex: fix match with possessive quantifier Dmitry V. Levin
  1 sibling, 1 reply; 13+ messages in thread
From: Egor Ignatov @ 2021-06-16  9:46 UTC (permalink / raw)
  To: ldv, eggert; +Cc: bug-gnulib

This fixes a bug described in 70b673eb7.

* lib/regexec.c (set_regs): Revert pop condition changed in the
commit mentioned above.
(proceed_next_node): Always proceed on OP_BACK_REF to the
next node if naccepted is 0.
(update_regs): Fix optional sub expression boundaries matching.
* tests/test-regex.c: Fix tests.

Signed-off-by: Egor Ignatov <egori@altlinux.org>
---
 lib/regexec.c      | 12 ++++++------
 tests/test-regex.c |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/regexec.c b/lib/regexec.c
index 5d4113c9d..23b984a21 100644
--- a/lib/regexec.c
+++ b/lib/regexec.c
@@ -1292,9 +1292,9 @@ proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t *regs,
 	      if (__glibc_unlikely (! ok))
 		return -2;
 	      dest_node = dfa->edests[node].elems[0];
-	      if (re_node_set_contains (&mctx->state_log[*pidx]->nodes,
-					dest_node))
-		return dest_node;
+	      if(dfa->nodes[dest_node].type == END_OF_RE)
+		regs[0].rm_eo = *pidx;
+	      return dest_node;
 	    }
 	}
 
@@ -1413,8 +1413,7 @@ set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
     {
       update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
 
-      if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
-	  || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
+      if (idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
 	{
 	  Idx reg_idx;
 	  cur_node = -1;
@@ -1514,7 +1513,8 @@ update_regs (const re_dfa_t *dfa, regmatch_t *pmatch,
 	  else
 	    {
 	      if (dfa->nodes[cur_node].opt_subexp
-		  && prev_idx_match[reg_num].rm_so != -1)
+		  && prev_idx_match[reg_num].rm_so != -1
+		  && pmatch[reg_num].rm_eo != -1)
 		/* We transited through an empty match for an optional
 		   subexpression, like (a?)*, and this is not the subexp's
 		   first match.  Copy back the old content of the registers
diff --git a/tests/test-regex.c b/tests/test-regex.c
index 7ea73cfb6..f73909258 100644
--- a/tests/test-regex.c
+++ b/tests/test-regex.c
@@ -119,7 +119,7 @@ static struct
   /* Test for *+ match.  */
   { "^a*+(.)", "ab", REG_EXTENDED, 2, { { 0, 2 }, { 1, 2 } } },
   /* Test for ** match.  */
-  { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 0, 1 }, { 1, 2 } } },
+  { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 1, 1 }, { 1, 2 } } },
 };
 
 static void
@@ -431,7 +431,7 @@ main (void)
       else if (! (regs.start[0] == 0 && regs.end[0] == 1))
         report_error ("re_search '%s' on '%s' returned wrong match [%d,%d)",
                       pat_sub2, data, (int) regs.start[0], (int) regs.end[0]);
-      else if (! (regs.start[1] == 0 && regs.end[1] == 0))
+      else if (! (regs.start[1] == 1 && regs.end[1] == 1))
         report_error ("re_search '%s' on '%s' returned wrong submatch [%d,%d)",
                       pat_sub2, data, regs.start[1], regs.end[1]);
       regfree (&regex);
-- 
2.29.3



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] regex: fix backreference matching
  2021-06-16  9:46     ` [PATCH] regex: fix backreference matching Egor Ignatov
@ 2021-06-16 10:13       ` Dmitry V. Levin
  2021-06-29  8:51         ` Egor Ignatov
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry V. Levin @ 2021-06-16 10:13 UTC (permalink / raw)
  To: Egor Ignatov; +Cc: Paul Eggert, bug-gnulib

On Wed, Jun 16, 2021 at 12:46:15PM +0300, Egor Ignatov wrote:
> This fixes a bug described in 70b673eb7.
[...]
> -  { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 0, 1 }, { 1, 2 } } },
> +  { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 1, 1 }, { 1, 2 } } },

Sorry, but how this could be correct?
Since the expression consists of two consequent parts, the whole match
should also consist of two consequent substring matches, shouldn't it?


-- 
ldv


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] regex: fix match with possessive quantifier
  2021-06-07  1:10   ` Dmitry V. Levin
  2021-06-16  9:46     ` [PATCH] regex: fix backreference matching Egor Ignatov
@ 2021-06-16 10:18     ` Dmitry V. Levin
  1 sibling, 0 replies; 13+ messages in thread
From: Dmitry V. Levin @ 2021-06-16 10:18 UTC (permalink / raw)
  To: Egor Ignatov; +Cc: Paul Eggert, bug-gnulib

On Mon, Jun 07, 2021 at 04:10:27AM +0300, Dmitry V. Levin wrote:
> On Mon, Jun 07, 2021 at 12:45:02AM +0300, Dmitry V. Levin wrote:
> > On Wed, May 26, 2021 at 12:08:19PM +0300, Egor Ignatov wrote:
> > > Fix behaviour introduced in 70b673e, where regexps with
> > > possessive quantifier("*+") didn't match.
> > > * lib/regexec.c
> > > (set_regs): Pop if CUR_NODE has already been checked only when
> > > we have a fail stack.
> > > 
> > > Signed-off-by: Egor Ignatov <egori@altlinux.org>
> > > ---
> > > Hi Paul,
> > > 
> > > Do you have any test cases for bug 11053(glibc) for gnulib?
> > > This patch fixes the issue with "*+", but I'm not sure it
> > > doesn't break your fix for 11053.
> > 
> > Thanks, the fix looks plausible, it doesn't break any tests
> > (including those introduced along with commit 70b673eb7),
> 
> Apparently, there are more issues with commit 70b673eb7, for example:
> 
> $ echo ab | sed -E 's/^(a*)*(.)\1/\1/'
> Segmentation fault
> 
> $ echo ab | strace -enone -- sed --debug -E 's/^(a*)*(.)\1/\1/'
> SED PROGRAM:
>   s/^(a*)*(.)\\1/\1/
> INPUT:   'STDIN' line 1
> PATTERN: ab
> COMMAND: s/^(a*)*(.)\\1/\1/
> MATCHED REGEX REGISTERS
>   regex[0] = 0-2 'ab'
>   regex[1] = 0--1 'ab!!ab
> '
> --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x20} ---
> +++ killed by SIGSEGV +++
> Segmentation fault

And here is a tests/test-regex.c entry for this bug:

diff --git a/tests/test-regex.c b/tests/test-regex.c
index 7ea73cfb6..fdb1a1f1d 100644
--- a/tests/test-regex.c
+++ b/tests/test-regex.c
@@ -120,6 +120,8 @@ static struct
   { "^a*+(.)", "ab", REG_EXTENDED, 2, { { 0, 2 }, { 1, 2 } } },
   /* Test for ** match.  */
   { "^(a*)*(.)", "ab", REG_EXTENDED, 3, { { 0, 2 }, { 0, 1 }, { 1, 2 } } },
+  /* Test for ** match with backreferences.  */
+  { "^(a*)*\\1", "a", REG_EXTENDED, 2, { { 0, 0 }, { 0, 0 } } },
 };
 
 static void


-- 
ldv


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] regex: fix match with possessive quantifier
  2021-06-06 21:45 ` Dmitry V. Levin
  2021-06-07  1:10   ` Dmitry V. Levin
@ 2021-06-21 21:09   ` Paul Eggert
  2021-06-22 15:35     ` Egor Ignatov
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Eggert @ 2021-06-21 21:09 UTC (permalink / raw)
  To: Dmitry V. Levin, Egor Ignatov; +Cc: bug-gnulib

On 6/6/21 2:45 PM, Dmitry V. Levin wrote:
> I've applied it now, and the following follow-up patch:

These recently-installed patches fail for me, which indicates that the 
patches aren't correct. Could you please fix this? The test failure is 
causing 'make check' to fail for GNU grep.

I tested on Ubuntu 21.04 x86-64 (with its packaged GCC 10.3.0-1ubuntu1), 
as follows:

./gnulib-tool --create-testdir --dir foo regex
cd foo
./configure CFLAGS='-g3 -O2 -fsanitize=undefined'
make check

foo/gltests/test-regex.log contains:

regex_internal.c:1317:7: runtime error: execution reached an unreachable 
program point
FAIL test-regex (exit status: 1)

The failing line is here:

       DEBUG_ASSERT (set->elems[idx - 1] < elem);

I got a similar failure when I configured this way instead:

./configure CFLAGS='-g3 -O2 -DDEBUG'

the only difference being a different diagnostic in test-regex.log:

test-regex: regex_internal.c:1317: re_node_set_insert: Assertion 
`set->elems[idx - 1] < elem' failed.
FAIL test-regex (exit status: 134)

The problem is not limited to Ubuntu, as I got a similar failure on 
Fedora 34 with its packaged GCC 11.1.1 20210531 (Red Hat 11.1.1-3).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] regex: fix match with possessive quantifier
  2021-06-21 21:09   ` Paul Eggert
@ 2021-06-22 15:35     ` Egor Ignatov
  2021-06-22 15:35       ` [PATCH] regex: fix assertion in re_node_set_insert Egor Ignatov
  0 siblings, 1 reply; 13+ messages in thread
From: Egor Ignatov @ 2021-06-22 15:35 UTC (permalink / raw)
  To: eggert; +Cc: bug-gnulib

> regex_internal.c:1317:7: runtime error: execution reached an unreachable program point
> FAIL test-regex (exit status: 1)

This problem occurs in this test:
'{ "()\\1*\\1*", "", REG_EXTENDED, 2, { { 0, 0 }, { 0, 0 } } }'
because proceed_next_node tries to insert an existing element
into the set.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] regex: fix assertion in re_node_set_insert
  2021-06-22 15:35     ` Egor Ignatov
@ 2021-06-22 15:35       ` Egor Ignatov
  2021-06-22 19:41         ` Paul Eggert
  0 siblings, 1 reply; 13+ messages in thread
From: Egor Ignatov @ 2021-06-22 15:35 UTC (permalink / raw)
  To: eggert; +Cc: bug-gnulib

* lib/regexec.c (proceed_next_node): Add duplicate insertion
check for eps_via_nodes set.

Signed-off-by: Egor Ignatov <egori@altlinux.org>
---
 lib/regexec.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/regexec.c b/lib/regexec.c
index 23b984a21..c05b92783 100644
--- a/lib/regexec.c
+++ b/lib/regexec.c
@@ -1220,10 +1220,12 @@ proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t *regs,
     {
       re_node_set *cur_nodes = &mctx->state_log[*pidx]->nodes;
       re_node_set *edests = &dfa->edests[node];
-      bool ok = re_node_set_insert (eps_via_nodes, node);
-      if (__glibc_unlikely (! ok))
-	return -2;
-
+      if(!re_node_set_contains (eps_via_nodes, node))
+	{
+	  bool ok = re_node_set_insert (eps_via_nodes, node);
+	  if (__glibc_unlikely (! ok))
+	    return -2;
+	}
       /* Pick a valid destination, or return -1 if none is found.  */
       Idx dest_node = -1;
       for (Idx i = 0; i < edests->nelem; i++)
-- 
2.29.3



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] regex: fix assertion in re_node_set_insert
  2021-06-22 15:35       ` [PATCH] regex: fix assertion in re_node_set_insert Egor Ignatov
@ 2021-06-22 19:41         ` Paul Eggert
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Eggert @ 2021-06-22 19:41 UTC (permalink / raw)
  To: Egor Ignatov; +Cc: bug-gnulib

Thanks, I installed that into Gnulib in your name.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] regex: fix backreference matching
  2021-06-16 10:13       ` Dmitry V. Levin
@ 2021-06-29  8:51         ` Egor Ignatov
  2021-07-05 12:12           ` Dmitry V. Levin
  0 siblings, 1 reply; 13+ messages in thread
From: Egor Ignatov @ 2021-06-29  8:51 UTC (permalink / raw)
  To: Dmitry V. Levin; +Cc: Paul Eggert, bug-gnulib

Well, then I have a few questions about matching and capturing
groups.

1. "ab" -> "^(a*)*(.)"
So, from your test case I can assume that:
regs[0] = (0, 2]
regs[1] = (0, 1]
regs[2] = (1, 2]

But if we add backref at the end:
2. "ab" -> "^(a*)*(.)\1"
check_matching matches the whole string "ab",
this means that the first group accepted 'a' but in fact is empty,
other vice it could not match backref later on.
What is the correct match here? Is check_matching wrong and
should match only "a" in the 2nd group (as it would be with
"^(a*)(.)\1")? or should set_regs check for this and shrink the
match?

Next,
3. "aaba" -> "^(a*)*(.)\1"
Again check_matching matches "aaba", then the first group
is "a", and were the 2nd 'a' goes?

In PCRE2 they save empty string for an optional groups like
"(a*)*", and I assume this is because capturing group saves the
last match and empty string matches. So in this case they would
match only "aab".

So please tell me how all 3 cases should match, this will
help me to fix the initial issue with backrefs and implement the
correct matching.

Thanks.

-- 
Egor



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] regex: fix backreference matching
  2021-06-29  8:51         ` Egor Ignatov
@ 2021-07-05 12:12           ` Dmitry V. Levin
  2021-07-09 12:36             ` [PATCH v2] " Egor Ignatov
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry V. Levin @ 2021-07-05 12:12 UTC (permalink / raw)
  To: Egor Ignatov; +Cc: Paul Eggert, bug-gnulib

On Tue, Jun 29, 2021 at 11:51:13AM +0300, Egor Ignatov wrote:
> Well, then I have a few questions about matching and capturing
> groups.
> 
> 1. "ab" -> "^(a*)*(.)"
> So, from your test case I can assume that:
> regs[0] = (0, 2]
> regs[1] = (0, 1]
> regs[2] = (1, 2]
> 
> But if we add backref at the end:
> 2. "ab" -> "^(a*)*(.)\1"
> check_matching matches the whole string "ab",
> this means that the first group accepted 'a' but in fact is empty,
> otherwise it could not match backref later on.
> What is the correct match here? Is check_matching wrong and
> should match only "a" in the 2nd group (as it would be with
> "^(a*)(.)\1")? or should set_regs check for this and shrink the
> match?

My test-regex.c entry for a similar but a bit simplified case was:
  /* Test for ** match with backreferences.  */
  { "^(a*)*\\1", "a", REG_EXTENDED, 2, { { 0, 0 }, { 0, 0 } } }

I suppose the corresponding entry for your example would be
  { "^(a*)*(.)\1", "ab", REG_EXTENDED, 3, { { 0, 1 }, { 0, 0 }, { 0, 1 } } }

> Next,
> 3. "aaba" -> "^(a*)*(.)\1"
> Again check_matching matches "aaba", then the first group
> is "a", and were the 2nd 'a' goes?

I suppose the corresponding test-regex.c entry for this case would be
  { "^(a*)*(.)\1", "aaba", REG_EXTENDED, 3, { { 0, 4 }, { 0, 1 }, { 2, 3 } } }


-- 
ldv


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] regex: fix backreference matching
  2021-07-05 12:12           ` Dmitry V. Levin
@ 2021-07-09 12:36             ` Egor Ignatov
  0 siblings, 0 replies; 13+ messages in thread
From: Egor Ignatov @ 2021-07-09 12:36 UTC (permalink / raw)
  To: ldv; +Cc: eggert, bug-gnulib

* lib/regexec.c
(proceed_next_node): Disable dest_node check if we have backrefs

(set_regs):Finish set_regs when we are at the last node and all
regs have been set.

(set_regs):
Also shrink the match if we ready to finish but didn't accept the entire
string matched by check_matching.  Because check_matching may return
a wrong match for regexp with back-references. For example
check_matching regex '(a*)*(.)\1' and string 'ab' results in the
match 'ab' where it should be just 'a' in the second capturing group.

All built in tests as well as test from sed and grep have passed.

Signed-off-by: Egor Ignatov <egori@altlinux.org>
---
 lib/regexec.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/lib/regexec.c b/lib/regexec.c
index 5e4eb497a..8f0f14575 100644
--- a/lib/regexec.c
+++ b/lib/regexec.c
@@ -1233,7 +1233,7 @@ proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t *regs,
       for (Idx i = 0; i < edests->nelem; i++)
 	{
 	  Idx candidate = edests->elems[i];
-	  if (!re_node_set_contains (cur_nodes, candidate))
+	  if (!dfa->nbackref && !re_node_set_contains (cur_nodes, candidate))
 	    continue;
           if (dest_node == -1)
 	    dest_node = candidate;
@@ -1296,9 +1296,7 @@ proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t *regs,
 	      if (__glibc_unlikely (! ok))
 		return -2;
 	      dest_node = dfa->edests[node].elems[0];
-	      if (re_node_set_contains (&mctx->state_log[*pidx]->nodes,
-					dest_node))
-		return dest_node;
+	      return dest_node;
 	    }
 	}
 
@@ -1308,8 +1306,9 @@ proceed_next_node (const re_match_context_t *mctx, Idx nregs, regmatch_t *regs,
 	  Idx dest_node = dfa->nexts[node];
 	  *pidx = (naccepted == 0) ? *pidx + 1 : *pidx + naccepted;
 	  if (fs && (*pidx > mctx->match_last || mctx->state_log[*pidx] == NULL
-		     || !re_node_set_contains (&mctx->state_log[*pidx]->nodes,
-					       dest_node)))
+		     || (!dfa->nbackref &&
+			 !re_node_set_contains (&mctx->state_log[*pidx]->nodes,
+						dest_node))))
 	    return -1;
 	  re_node_set_empty (eps_via_nodes);
 	  return dest_node;
@@ -1417,8 +1416,7 @@ set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
     {
       update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
 
-      if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
-	  || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
+      if (cur_node == mctx->last_node)
 	{
 	  Idx reg_idx;
 	  cur_node = -1;
@@ -1434,6 +1432,7 @@ set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
 	    }
 	  if (cur_node < 0)
 	    {
+	      pmatch[0].rm_eo = idx;
 	      re_node_set_free (&eps_via_nodes);
 	      regmatch_list_free (&prev_match);
 	      return free_fail_stack_return (fs);
-- 
2.29.3



^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-07-09 12:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-26  9:08 [PATCH] regex: fix match with possessive quantifier Egor Ignatov
2021-06-06 21:45 ` Dmitry V. Levin
2021-06-07  1:10   ` Dmitry V. Levin
2021-06-16  9:46     ` [PATCH] regex: fix backreference matching Egor Ignatov
2021-06-16 10:13       ` Dmitry V. Levin
2021-06-29  8:51         ` Egor Ignatov
2021-07-05 12:12           ` Dmitry V. Levin
2021-07-09 12:36             ` [PATCH v2] " Egor Ignatov
2021-06-16 10:18     ` [PATCH] regex: fix match with possessive quantifier Dmitry V. Levin
2021-06-21 21:09   ` Paul Eggert
2021-06-22 15:35     ` Egor Ignatov
2021-06-22 15:35       ` [PATCH] regex: fix assertion in re_node_set_insert Egor Ignatov
2021-06-22 19:41         ` Paul Eggert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).