bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
From: Egor Ignatov <egori@altlinux.org>
To: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>, bug-gnulib@gnu.org
Subject: Re: [PATCH] regex: fix backreference matching
Date: Tue, 29 Jun 2021 11:51:13 +0300	[thread overview]
Message-ID: <85975173-4e58-1402-00c8-8d065b967f99@altlinux.org> (raw)
In-Reply-To: <20210616101339.GA8379@altlinux.org>

Well, then I have a few questions about matching and capturing
groups.

1. "ab" -> "^(a*)*(.)"
So, from your test case I can assume that:
regs[0] = (0, 2]
regs[1] = (0, 1]
regs[2] = (1, 2]

But if we add backref at the end:
2. "ab" -> "^(a*)*(.)\1"
check_matching matches the whole string "ab",
this means that the first group accepted 'a' but in fact is empty,
other vice it could not match backref later on.
What is the correct match here? Is check_matching wrong and
should match only "a" in the 2nd group (as it would be with
"^(a*)(.)\1")? or should set_regs check for this and shrink the
match?

Next,
3. "aaba" -> "^(a*)*(.)\1"
Again check_matching matches "aaba", then the first group
is "a", and were the 2nd 'a' goes?

In PCRE2 they save empty string for an optional groups like
"(a*)*", and I assume this is because capturing group saves the
last match and empty string matches. So in this case they would
match only "aab".

So please tell me how all 3 cases should match, this will
help me to fix the initial issue with backrefs and implement the
correct matching.

Thanks.

-- 
Egor



  reply	other threads:[~2021-06-29  8:51 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26  9:08 [PATCH] regex: fix match with possessive quantifier Egor Ignatov
2021-06-06 21:45 ` Dmitry V. Levin
2021-06-07  1:10   ` Dmitry V. Levin
2021-06-16  9:46     ` [PATCH] regex: fix backreference matching Egor Ignatov
2021-06-16 10:13       ` Dmitry V. Levin
2021-06-29  8:51         ` Egor Ignatov [this message]
2021-07-05 12:12           ` Dmitry V. Levin
2021-07-09 12:36             ` [PATCH v2] " Egor Ignatov
2021-06-16 10:18     ` [PATCH] regex: fix match with possessive quantifier Dmitry V. Levin
2021-06-21 21:09   ` Paul Eggert
2021-06-22 15:35     ` Egor Ignatov
2021-06-22 15:35       ` [PATCH] regex: fix assertion in re_node_set_insert Egor Ignatov
2021-06-22 19:41         ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85975173-4e58-1402-00c8-8d065b967f99@altlinux.org \
    --to=egori@altlinux.org \
    --cc=bug-gnulib@gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=ldv@altlinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).