bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* Re: Changed behavior in sed 4.6
       [not found] <20181220184119.3jb6iakjsmeatja3@kalarepa>
@ 2018-12-21  3:27 ` Jim Meyering
  2018-12-21  5:13   ` [Grep-devel] " arnold
  0 siblings, 1 reply; 6+ messages in thread
From: Jim Meyering @ 2018-12-21  3:27 UTC (permalink / raw)
  To: atler, GNU grep developers, bug-gnulib@gnu.org List,
	Norihiro Tanaka
  Cc: sed-devel

On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <atler@pld-linux.org> wrote:
> I've just happened to notice a difference in behavior between sed 4.5 and 4.6
> when building VirtualBox. It seems to be locale dependent:
>
> $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> foo(bar
>
> $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> foo(foo *
>
> In 4.5 both results are the same -- same as the second output with
> LC_ALL=C.UTF-8.

Thanks a lot for that report.
This is indeed a regression. It also affects the just-release
grep-3.2, since the source is in a file used by both: gnulib's dfa.c.
I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28
To back that out, I must first revert part of this fix-up patch:
v0.1-2281-g95cd86dd7

Here's a demonstrator with grep: (it should match, but with 3.2, does not):

$ echo 123-x|LC_ALL=C grep '.\bx'
$

To avoid the failure, one can:
- specify -P (for PCRE, a different matcher), or
- don't use the C locale, but rather use a multi-byte locale like the
one you chose, which inhibits use of the DFA matcher, because \b's
definition requires multi-byte aware machinery not present in the DFA
matcher.

I expect to revert the mentioned mentioned gnulib commits, and then to
make new releases of both grep and sed.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Grep-devel] Changed behavior in sed 4.6
  2018-12-21  3:27 ` Changed behavior in sed 4.6 Jim Meyering
@ 2018-12-21  5:13   ` arnold
  2018-12-21  5:18     ` Jim Meyering
  0 siblings, 1 reply; 6+ messages in thread
From: arnold @ 2018-12-21  5:13 UTC (permalink / raw)
  To: noritnk, jim, grep-devel, bug-gnulib, atler; +Cc: sed-devel

Jim Meyering <jim@meyering.net> wrote:

> On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <atler@pld-linux.org> wrote:
> > I've just happened to notice a difference in behavior between sed 4.5 and 4.6
> > when building VirtualBox. It seems to be locale dependent:
> >
> > $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> > foo(bar
> >
> > $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> > foo(foo *
> >
> > In 4.5 both results are the same -- same as the second output with
> > LC_ALL=C.UTF-8.
>
> Thanks a lot for that report.
> This is indeed a regression. It also affects the just-release
> grep-3.2, since the source is in a file used by both: gnulib's dfa.c.
> I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28
> To back that out, I must first revert part of this fix-up patch:
> v0.1-2281-g95cd86dd7
>
> Here's a demonstrator with grep: (it should match, but with 3.2, does not):
>
> $ echo 123-x|LC_ALL=C grep '.\bx'
> $
>
> To avoid the failure, one can:
> - specify -P (for PCRE, a different matcher), or
> - don't use the C locale, but rather use a multi-byte locale like the
> one you chose, which inhibits use of the DFA matcher, because \b's
> definition requires multi-byte aware machinery not present in the DFA
> matcher.
>
> I expect to revert the mentioned mentioned gnulib commits, and then to
> make new releases of both grep and sed.

Please add a test case ...

THanks,

Arnold


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Grep-devel] Changed behavior in sed 4.6
  2018-12-21  5:13   ` [Grep-devel] " arnold
@ 2018-12-21  5:18     ` Jim Meyering
  2018-12-21  5:30       ` Jim Meyering
  2018-12-21  9:19       ` arnold
  0 siblings, 2 replies; 6+ messages in thread
From: Jim Meyering @ 2018-12-21  5:18 UTC (permalink / raw)
  To: Aharon Robbins
  Cc: sed-devel, bug-gnulib@gnu.org List, Norihiro Tanaka, atler,
	GNU grep developers

On Thu, Dec 20, 2018 at 9:13 PM <arnold@skeeve.com> wrote:
> > I expect to revert the mentioned mentioned gnulib commits, and then to
> > make new releases of both grep and sed.
>
> Please add a test case ...

You should know me better by now.
I didn't mention the required NEWS update either.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Grep-devel] Changed behavior in sed 4.6
  2018-12-21  5:18     ` Jim Meyering
@ 2018-12-21  5:30       ` Jim Meyering
  2018-12-21  5:35         ` Jim Meyering
  2018-12-21  9:19       ` arnold
  1 sibling, 1 reply; 6+ messages in thread
From: Jim Meyering @ 2018-12-21  5:30 UTC (permalink / raw)
  To: Aharon Robbins
  Cc: sed-devel, bug-gnulib@gnu.org List, Norihiro Tanaka, atler,
	GNU grep developers

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]

I've attached the patch I expect to push in less than 30 minutes, in
case someone wants to review or test quickly.
I've also created a snapshot here:

sed snapshot:
  https://meyering.net/sed/sed-ss.tar.xz      1.3 MB
  https://meyering.net/sed/sed-ss.tar.xz.sig
  https://meyering.net/sed/sed-4.6.2-799c.tar.xz

[-- Attachment #2: sed-dfa-fix.diff --]
[-- Type: application/octet-stream, Size: 2821 bytes --]

From 799ce52011f81607bb2d8aba357112a8682a4118 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@fb.com>
Date: Thu, 20 Dec 2018 20:54:26 -0800
Subject: [PATCH] sed: fix \b DFA-bug in C locale

Under some conditions, \b would mistakenly fail to match. E.g.,
this would mistakenly print "123-x" instead of "123":
  echo 123-x|LC_ALL=C sed 's/.\bx//'
* NEWS (Bug fixes): Mention it
* gnulib: Update to latest, for DFA regression fix.
* testsuite/word-delim.sh: New file, to test for the dfa.c regression.
* testsuite/local.mk (T): Add it.
Reported by Jan Palus in
https://lists.gnu.org/r/sed-devel/2018-12/msg00022.html
---
 NEWS                    |  8 ++++++++
 gnulib                  |  2 +-
 testsuite/local.mk      |  1 +
 testsuite/word-delim.sh | 19 +++++++++++++++++++
 4 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100755 testsuite/word-delim.sh

diff --git a/NEWS b/NEWS
index 7bc203e..dd75aee 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,14 @@ GNU sed NEWS                                    -*- outline -*-

 * Noteworthy changes in release ?.? (????-??-??) [?]

+** Bug fixes
+
+  Some uses of \b in the C locale and with the DFA matcher would fail, e.g.,
+  the following would mistakenly print "123-x" instead of "123":
+    echo 123-x|LC_ALL=C sed 's/.\bx//'
+  Using a multibyte locale or certain regexp constructs (some ranges,
+  backreferences) would avoid the bug.  [bug introduced in sed 4.6]
+

 * Noteworthy changes in release 4.6 (2018-12-19) [stable]

diff --git a/gnulib b/gnulib
index 453f37e..5d6a3cd 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit 453f37e2b6364cb5fdcd79f9330b962c88daab9f
+Subproject commit 5d6a3cdd5c312e77a6d0f0848e3cb79a52e08658
diff --git a/testsuite/local.mk b/testsuite/local.mk
index 8213b06..43623bf 100644
--- a/testsuite/local.mk
+++ b/testsuite/local.mk
@@ -112,6 +112,7 @@ T += testsuite/8bit.sh			\
      testsuite/stdin.sh                 \
      testsuite/utf8-ru.sh		\
      testsuite/uniq.sh			\
+     testsuite/word-delim.sh		\
      testsuite/xemacs.sh

 TESTS = $(SEDTESTS) $(T)
diff --git a/testsuite/word-delim.sh b/testsuite/word-delim.sh
new file mode 100755
index 0000000..ade3137
--- /dev/null
+++ b/testsuite/word-delim.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+# Exercise the DFA regression in sed-4.6.
+. "${srcdir=.}/testsuite/init.sh"; path_prepend_ ./sed
+print_ver_ sed
+
+require_en_utf8_locale_
+
+# Also ensure that this works in both the C locale and that multibyte one.
+# In the C locale, it failed due to a dfa.c regression in sed-4.6.
+echo 123-x > in || framework_failure_
+echo 123 > exp || framework_failure_
+
+for locale in C en_US.UTF-8; do
+  LC_ALL=$locale sed 's/.\bx//' in > out 2>err || fail=1
+  compare exp out || fail=1
+  compare /dev/null err || fail=1
+done
+
+Exit $fail
-- 
2.20.1.2.gb21ebb671b


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Grep-devel] Changed behavior in sed 4.6
  2018-12-21  5:30       ` Jim Meyering
@ 2018-12-21  5:35         ` Jim Meyering
  0 siblings, 0 replies; 6+ messages in thread
From: Jim Meyering @ 2018-12-21  5:35 UTC (permalink / raw)
  To: Aharon Robbins
  Cc: sed-devel, bug-gnulib@gnu.org List, Norihiro Tanaka, atler,
	GNU grep developers

On Thu, Dec 20, 2018 at 9:30 PM Jim Meyering <jim@meyering.net> wrote:
> I've attached the patch I expect to push in less than 30 minutes, in
> case someone wants to review or test quickly.
> I've also created a snapshot here:
>
> sed snapshot:
>   https://meyering.net/sed/sed-ss.tar.xz      1.3 MB
>   https://meyering.net/sed/sed-ss.tar.xz.sig
>   https://meyering.net/sed/sed-4.6.2-799c.tar.xz

To be clear, I'm releasing sed-4.7 in about 30 minutes.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Grep-devel] Changed behavior in sed 4.6
  2018-12-21  5:18     ` Jim Meyering
  2018-12-21  5:30       ` Jim Meyering
@ 2018-12-21  9:19       ` arnold
  1 sibling, 0 replies; 6+ messages in thread
From: arnold @ 2018-12-21  9:19 UTC (permalink / raw)
  To: jim, arnold; +Cc: bug-gnulib, sed-devel, noritnk, atler, grep-devel

Jim Meyering <jim@meyering.net> wrote:

> On Thu, Dec 20, 2018 at 9:13 PM <arnold@skeeve.com> wrote:
> > > I expect to revert the mentioned mentioned gnulib commits, and then to
> > > make new releases of both grep and sed.
> >
> > Please add a test case ...
>
> You should know me better by now.
> I didn't mention the required NEWS update either.

Indeed, I knew I was requesting the obvious, but sometime people forget ...

In any case, much thanks!

Arnold


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-12-21  9:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20181220184119.3jb6iakjsmeatja3@kalarepa>
2018-12-21  3:27 ` Changed behavior in sed 4.6 Jim Meyering
2018-12-21  5:13   ` [Grep-devel] " arnold
2018-12-21  5:18     ` Jim Meyering
2018-12-21  5:30       ` Jim Meyering
2018-12-21  5:35         ` Jim Meyering
2018-12-21  9:19       ` arnold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).