* Re: Changed behavior in sed 4.6
[not found] <20181220184119.3jb6iakjsmeatja3@kalarepa>
@ 2018-12-21 3:27 ` Jim Meyering
2018-12-21 5:13 ` [Grep-devel] " arnold
0 siblings, 1 reply; 6+ messages in thread
From: Jim Meyering @ 2018-12-21 3:27 UTC (permalink / raw)
To: atler, GNU grep developers, bug-gnulib@gnu.org List,
Norihiro Tanaka
Cc: sed-devel
On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <atler@pld-linux.org> wrote:
> I've just happened to notice a difference in behavior between sed 4.5 and 4.6
> when building VirtualBox. It seems to be locale dependent:
>
> $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> foo(bar
>
> $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> foo(foo *
>
> In 4.5 both results are the same -- same as the second output with
> LC_ALL=C.UTF-8.
Thanks a lot for that report.
This is indeed a regression. It also affects the just-release
grep-3.2, since the source is in a file used by both: gnulib's dfa.c.
I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28
To back that out, I must first revert part of this fix-up patch:
v0.1-2281-g95cd86dd7
Here's a demonstrator with grep: (it should match, but with 3.2, does not):
$ echo 123-x|LC_ALL=C grep '.\bx'
$
To avoid the failure, one can:
- specify -P (for PCRE, a different matcher), or
- don't use the C locale, but rather use a multi-byte locale like the
one you chose, which inhibits use of the DFA matcher, because \b's
definition requires multi-byte aware machinery not present in the DFA
matcher.
I expect to revert the mentioned mentioned gnulib commits, and then to
make new releases of both grep and sed.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Grep-devel] Changed behavior in sed 4.6
2018-12-21 3:27 ` Changed behavior in sed 4.6 Jim Meyering
@ 2018-12-21 5:13 ` arnold
2018-12-21 5:18 ` Jim Meyering
0 siblings, 1 reply; 6+ messages in thread
From: arnold @ 2018-12-21 5:13 UTC (permalink / raw)
To: noritnk, jim, grep-devel, bug-gnulib, atler; +Cc: sed-devel
Jim Meyering <jim@meyering.net> wrote:
> On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <atler@pld-linux.org> wrote:
> > I've just happened to notice a difference in behavior between sed 4.5 and 4.6
> > when building VirtualBox. It seems to be locale dependent:
> >
> > $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> > foo(bar
> >
> > $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> > foo(foo *
> >
> > In 4.5 both results are the same -- same as the second output with
> > LC_ALL=C.UTF-8.
>
> Thanks a lot for that report.
> This is indeed a regression. It also affects the just-release
> grep-3.2, since the source is in a file used by both: gnulib's dfa.c.
> I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28
> To back that out, I must first revert part of this fix-up patch:
> v0.1-2281-g95cd86dd7
>
> Here's a demonstrator with grep: (it should match, but with 3.2, does not):
>
> $ echo 123-x|LC_ALL=C grep '.\bx'
> $
>
> To avoid the failure, one can:
> - specify -P (for PCRE, a different matcher), or
> - don't use the C locale, but rather use a multi-byte locale like the
> one you chose, which inhibits use of the DFA matcher, because \b's
> definition requires multi-byte aware machinery not present in the DFA
> matcher.
>
> I expect to revert the mentioned mentioned gnulib commits, and then to
> make new releases of both grep and sed.
Please add a test case ...
THanks,
Arnold
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Grep-devel] Changed behavior in sed 4.6
2018-12-21 5:13 ` [Grep-devel] " arnold
@ 2018-12-21 5:18 ` Jim Meyering
2018-12-21 5:30 ` Jim Meyering
2018-12-21 9:19 ` arnold
0 siblings, 2 replies; 6+ messages in thread
From: Jim Meyering @ 2018-12-21 5:18 UTC (permalink / raw)
To: Aharon Robbins
Cc: sed-devel, bug-gnulib@gnu.org List, Norihiro Tanaka, atler,
GNU grep developers
On Thu, Dec 20, 2018 at 9:13 PM <arnold@skeeve.com> wrote:
> > I expect to revert the mentioned mentioned gnulib commits, and then to
> > make new releases of both grep and sed.
>
> Please add a test case ...
You should know me better by now.
I didn't mention the required NEWS update either.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Grep-devel] Changed behavior in sed 4.6
2018-12-21 5:18 ` Jim Meyering
@ 2018-12-21 5:30 ` Jim Meyering
2018-12-21 5:35 ` Jim Meyering
2018-12-21 9:19 ` arnold
1 sibling, 1 reply; 6+ messages in thread
From: Jim Meyering @ 2018-12-21 5:30 UTC (permalink / raw)
To: Aharon Robbins
Cc: sed-devel, bug-gnulib@gnu.org List, Norihiro Tanaka, atler,
GNU grep developers
[-- Attachment #1: Type: text/plain, Size: 312 bytes --]
I've attached the patch I expect to push in less than 30 minutes, in
case someone wants to review or test quickly.
I've also created a snapshot here:
sed snapshot:
https://meyering.net/sed/sed-ss.tar.xz 1.3 MB
https://meyering.net/sed/sed-ss.tar.xz.sig
https://meyering.net/sed/sed-4.6.2-799c.tar.xz
[-- Attachment #2: sed-dfa-fix.diff --]
[-- Type: application/octet-stream, Size: 2821 bytes --]
From 799ce52011f81607bb2d8aba357112a8682a4118 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@fb.com>
Date: Thu, 20 Dec 2018 20:54:26 -0800
Subject: [PATCH] sed: fix \b DFA-bug in C locale
Under some conditions, \b would mistakenly fail to match. E.g.,
this would mistakenly print "123-x" instead of "123":
echo 123-x|LC_ALL=C sed 's/.\bx//'
* NEWS (Bug fixes): Mention it
* gnulib: Update to latest, for DFA regression fix.
* testsuite/word-delim.sh: New file, to test for the dfa.c regression.
* testsuite/local.mk (T): Add it.
Reported by Jan Palus in
https://lists.gnu.org/r/sed-devel/2018-12/msg00022.html
---
NEWS | 8 ++++++++
gnulib | 2 +-
testsuite/local.mk | 1 +
testsuite/word-delim.sh | 19 +++++++++++++++++++
4 files changed, 29 insertions(+), 1 deletion(-)
create mode 100755 testsuite/word-delim.sh
diff --git a/NEWS b/NEWS
index 7bc203e..dd75aee 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,14 @@ GNU sed NEWS -*- outline -*-
* Noteworthy changes in release ?.? (????-??-??) [?]
+** Bug fixes
+
+ Some uses of \b in the C locale and with the DFA matcher would fail, e.g.,
+ the following would mistakenly print "123-x" instead of "123":
+ echo 123-x|LC_ALL=C sed 's/.\bx//'
+ Using a multibyte locale or certain regexp constructs (some ranges,
+ backreferences) would avoid the bug. [bug introduced in sed 4.6]
+
* Noteworthy changes in release 4.6 (2018-12-19) [stable]
diff --git a/gnulib b/gnulib
index 453f37e..5d6a3cd 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit 453f37e2b6364cb5fdcd79f9330b962c88daab9f
+Subproject commit 5d6a3cdd5c312e77a6d0f0848e3cb79a52e08658
diff --git a/testsuite/local.mk b/testsuite/local.mk
index 8213b06..43623bf 100644
--- a/testsuite/local.mk
+++ b/testsuite/local.mk
@@ -112,6 +112,7 @@ T += testsuite/8bit.sh \
testsuite/stdin.sh \
testsuite/utf8-ru.sh \
testsuite/uniq.sh \
+ testsuite/word-delim.sh \
testsuite/xemacs.sh
TESTS = $(SEDTESTS) $(T)
diff --git a/testsuite/word-delim.sh b/testsuite/word-delim.sh
new file mode 100755
index 0000000..ade3137
--- /dev/null
+++ b/testsuite/word-delim.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+# Exercise the DFA regression in sed-4.6.
+. "${srcdir=.}/testsuite/init.sh"; path_prepend_ ./sed
+print_ver_ sed
+
+require_en_utf8_locale_
+
+# Also ensure that this works in both the C locale and that multibyte one.
+# In the C locale, it failed due to a dfa.c regression in sed-4.6.
+echo 123-x > in || framework_failure_
+echo 123 > exp || framework_failure_
+
+for locale in C en_US.UTF-8; do
+ LC_ALL=$locale sed 's/.\bx//' in > out 2>err || fail=1
+ compare exp out || fail=1
+ compare /dev/null err || fail=1
+done
+
+Exit $fail
--
2.20.1.2.gb21ebb671b
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Grep-devel] Changed behavior in sed 4.6
2018-12-21 5:30 ` Jim Meyering
@ 2018-12-21 5:35 ` Jim Meyering
0 siblings, 0 replies; 6+ messages in thread
From: Jim Meyering @ 2018-12-21 5:35 UTC (permalink / raw)
To: Aharon Robbins
Cc: sed-devel, bug-gnulib@gnu.org List, Norihiro Tanaka, atler,
GNU grep developers
On Thu, Dec 20, 2018 at 9:30 PM Jim Meyering <jim@meyering.net> wrote:
> I've attached the patch I expect to push in less than 30 minutes, in
> case someone wants to review or test quickly.
> I've also created a snapshot here:
>
> sed snapshot:
> https://meyering.net/sed/sed-ss.tar.xz 1.3 MB
> https://meyering.net/sed/sed-ss.tar.xz.sig
> https://meyering.net/sed/sed-4.6.2-799c.tar.xz
To be clear, I'm releasing sed-4.7 in about 30 minutes.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Grep-devel] Changed behavior in sed 4.6
2018-12-21 5:18 ` Jim Meyering
2018-12-21 5:30 ` Jim Meyering
@ 2018-12-21 9:19 ` arnold
1 sibling, 0 replies; 6+ messages in thread
From: arnold @ 2018-12-21 9:19 UTC (permalink / raw)
To: jim, arnold; +Cc: bug-gnulib, sed-devel, noritnk, atler, grep-devel
Jim Meyering <jim@meyering.net> wrote:
> On Thu, Dec 20, 2018 at 9:13 PM <arnold@skeeve.com> wrote:
> > > I expect to revert the mentioned mentioned gnulib commits, and then to
> > > make new releases of both grep and sed.
> >
> > Please add a test case ...
>
> You should know me better by now.
> I didn't mention the required NEWS update either.
Indeed, I knew I was requesting the obvious, but sometime people forget ...
In any case, much thanks!
Arnold
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-12-21 9:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20181220184119.3jb6iakjsmeatja3@kalarepa>
2018-12-21 3:27 ` Changed behavior in sed 4.6 Jim Meyering
2018-12-21 5:13 ` [Grep-devel] " arnold
2018-12-21 5:18 ` Jim Meyering
2018-12-21 5:30 ` Jim Meyering
2018-12-21 5:35 ` Jim Meyering
2018-12-21 9:19 ` arnold
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).