From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-4.1 required=3.0 tests=AWL,BAYES_00, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,SPF_HELO_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C42A91F5AE for ; Sun, 18 Jul 2021 19:31:15 +0000 (UTC) Received: from localhost ([::1]:48008 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m5CVO-0003cS-Gi for normalperson@yhbt.net; Sun, 18 Jul 2021 15:31:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56904) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m5CVJ-0003WU-HA for bug-gnulib@gnu.org; Sun, 18 Jul 2021 15:31:09 -0400 Received: from freefriends.org ([96.88.95.60]:42288) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m5CVH-0001g5-Nk for bug-gnulib@gnu.org; Sun, 18 Jul 2021 15:31:09 -0400 X-Envelope-From: arnold@skeeve.com Received: from freefriends.org (freefriends.org [96.88.95.60]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 16IJUwtK011226 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 18 Jul 2021 13:30:59 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 16IJUvI3011225; Sun, 18 Jul 2021 13:30:57 -0600 From: arnold@skeeve.com Message-Id: <202107181930.16IJUvI3011225@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Sun, 18 Jul 2021 13:30:57 -0600 To: bug-gnulib@gnu.org, bruno@clisp.org Subject: Re: possible bug in regex and dfa References: <85ef7fe3-c793-f082-3df1-3011fd8d0966@cs.ucla.edu> <202107181256.16ICuEjF027369@freefriends.org> <3323531.JAME3IizvO@omega> In-Reply-To: <3323531.JAME3IizvO@omega> User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Received-SPF: none client-ip=96.88.95.60; envelope-from=arnold@skeeve.com; helo=freefriends.org X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: eggert@cs.ucla.edu, arnold@skeeve.com Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" Hi. Bruno Haible wrote: > - if REG_NEWLINE is not set, '.' matches newline but '^' does not match > after the newline. This is indeed the desired behavior, but regex isn't following it. REG_NEWLINE being set gets translated into preg->newline_anchor. Starting at line 620, regexec.c relates to it: | /* If initial states with non-begbuf contexts have no elements, | the regex must be anchored. If preg->newline_anchor is set, | we'll never use init_state_nl, so do not check it. */ | if (dfa->init_state->nodes.nelem == 0 | && dfa->init_state_word->nodes.nelem == 0 | && (dfa->init_state_nl->nodes.nelem == 0 | || !preg->newline_anchor)) | { | if (start != 0 && last_start != 0) | return REG_NOMATCH; | start = last_start = 0; | } (As a side note, I don't think the comment matches the code.) In my case, preg->newline_anchor is zero (correctly), but dfa->init_state->nodes.nelem is not, so this body isn't executed. Making the test for preg->newline_anchor the first thing causes my test case to work correctly but breaks the gawk test suite. In other words, I think the bug is somewhere in this area, but I don't understand the regex internals enough to fix it. dfa will also need looking at. Thanks, Arnold