From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.0.0.0/16 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 207141F405 for ; Fri, 21 Dec 2018 03:27:49 +0000 (UTC) Received: from localhost ([::1]:42081 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gaBTW-0003F0-8O for normalperson@yhbt.net; Thu, 20 Dec 2018 22:27:46 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35283) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gaBTS-0003E7-BZ for bug-gnulib@gnu.org; Thu, 20 Dec 2018 22:27:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gaBTR-0000Re-Md for bug-gnulib@gnu.org; Thu, 20 Dec 2018 22:27:42 -0500 Received: from mail-wr1-f66.google.com ([209.85.221.66]:35555) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gaBTL-0000J8-I3; Thu, 20 Dec 2018 22:27:35 -0500 Received: by mail-wr1-f66.google.com with SMTP id 96so3777357wrb.2; Thu, 20 Dec 2018 19:27:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6fXoLf/gxujM5DciY4eQQHuebS2HYnjbg4Dv2EKpKR8=; b=ugAu8yk6sFM3ps5DYFKxrWXLNAyPy44ozqrkeIiLw6IPTSm2LOV7Bv2ldl2t+oWh7+ EQpSjf57pZmOZj63o7jY28MJV2DQ2qQfp5YJMrhAjXawETdMH55U7VvROXLpkYBw83Qd IECmakQ+dJckNQ1swFth3vS1qGHTR756wa3tgSfheSt8lTzQmuA/Fvn/gJpIRcDFBBU5 K8eStjgaupsD8bbJqrrIHIE1c8DKVLCNcEuh8eAB4m1LdrfmlX2Pno6BvUvFhJMHc2Bg re+YtDZfw2uJaaz5U6UCJnjBBWyIx0ZTS60+VH9qQahdgIW3O+QRFPuKXXutcDKr+YYM 6Z5w== X-Gm-Message-State: AJcUukcEoWL4u+hjd/j2vh2xN8sxHsQ3vSA7kQ0scKtJqYt5rxL+PnJF v4FW/ADnKJLUzdJthASWHe0rRQUEz6rFiE7qSno= X-Google-Smtp-Source: ALg8bN77GktxD03v+kunoyjblk/sy/c54nRGK/xBVoZeaLeYmQiaqxK0+aYnq8No24wy4Ke0Gn0rJ7HXe+iArTnuxi8= X-Received: by 2002:adf:c108:: with SMTP id r8mr678873wre.233.1545362854329; Thu, 20 Dec 2018 19:27:34 -0800 (PST) MIME-Version: 1.0 References: <20181220184119.3jb6iakjsmeatja3@kalarepa> In-Reply-To: <20181220184119.3jb6iakjsmeatja3@kalarepa> From: Jim Meyering Date: Thu, 20 Dec 2018 19:27:21 -0800 Message-ID: Subject: Re: Changed behavior in sed 4.6 To: atler@pld-linux.org, GNU grep developers , "bug-gnulib@gnu.org List" , Norihiro Tanaka Content-Type: text/plain; charset="UTF-8" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.85.221.66 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: sed-devel@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" On Thu, Dec 20, 2018 at 2:49 PM Jan Palus wrote: > I've just happened to notice a difference in behavior between sed 4.5 and 4.6 > when building VirtualBox. It seems to be locale dependent: > > $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g' > foo(bar > > $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g' > foo(foo * > > In 4.5 both results are the same -- same as the second output with > LC_ALL=C.UTF-8. Thanks a lot for that report. This is indeed a regression. It also affects the just-release grep-3.2, since the source is in a file used by both: gnulib's dfa.c. I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28 To back that out, I must first revert part of this fix-up patch: v0.1-2281-g95cd86dd7 Here's a demonstrator with grep: (it should match, but with 3.2, does not): $ echo 123-x|LC_ALL=C grep '.\bx' $ To avoid the failure, one can: - specify -P (for PCRE, a different matcher), or - don't use the C locale, but rather use a multi-byte locale like the one you chose, which inhibits use of the DFA matcher, because \b's definition requires multi-byte aware machinery not present in the DFA matcher. I expect to revert the mentioned mentioned gnulib commits, and then to make new releases of both grep and sed.