From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=unavailable autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 3CD491F4B4 for ; Wed, 30 Dec 2020 20:15:20 +0000 (UTC) Received: from localhost ([::1]:49974 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kuhsM-0002gl-RX for normalperson@yhbt.net; Wed, 30 Dec 2020 15:15:18 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:49654) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kuhsK-0002gb-7C for bug-gnulib@gnu.org; Wed, 30 Dec 2020 15:15:16 -0500 Received: from mail-qt1-x829.google.com ([2607:f8b0:4864:20::829]:40888) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kuhsH-0001nL-Pl for bug-gnulib@gnu.org; Wed, 30 Dec 2020 15:15:15 -0500 Received: by mail-qt1-x829.google.com with SMTP id v5so11653622qtv.7 for ; Wed, 30 Dec 2020 12:15:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=H+k3pqoJjWu2cTUOL2uSXLK8Rd19WA79DWUyMAxZIGg=; b=cBaAxn6XkU8Ko7dqd0nw1dUO0EsaWbhXV04o7w8GQoRoAE5Yj3oRAGS9GKyHmyXsT6 9zjobWMtvj4wLucLh1D0+iqEM0oTo/WHPabZ2RlSarT5fMLSECfdU0p47bRQaRuhjgbH DJ+OV38bkUK5DsM/bPeBSbFZeAgs9zmfmvhhgiwh7G+h1Pt8WUsomQTcJheARxYHHEYG 2ZxFFqyu2caK3aZqg92I5qGzLUNaie4oB2BBpSWZiuFJjsAcOLUhO8U/DcSeJCeaXBq1 sGyffa3MPwkAJVt6mQkrlf8DIQWqX6dEMsGUNqR7y0afIiH/pq6BkSkR6DGgCWSOqkyP u9Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=H+k3pqoJjWu2cTUOL2uSXLK8Rd19WA79DWUyMAxZIGg=; b=NDHRnjCGUZm6NhWJhP5fjONTxO+yw+0fBV6O05+zo5ijNhwT7+ssk90bzcTUCQ4IKJ ynUKAs7xFc2PRLGTaQMFBRPRhMcxXCQ6mgYUu7r8k2I+R4k+nXk40YKUcJdNrDrOKdK9 tek09Ai+z6TVVhH+NuyT39eco5yPiVSUbeeTpjPLkUhmo2+SoQmlpD8lf0Yhhh5qyxY9 QGZZQX944WrPqVm+IdM7ubnQGnsgxqfaqFzHXK5TmNODFIPzlZTXKF90x/68c3YpmhNR AUuZiFXU9PmhBRTheyhWgGPim2wPj29AbENR24ZgA1wLWzoMGz4XAKGaYt5SVn0BKI8j Guvw== X-Gm-Message-State: AOAM531fD1WxQD9o5oOtt+2lYFzkWqnegX030QMigbZnNJui7jQDJekm wX4Wea0coU38sWsIpgoJga/BBg== X-Google-Smtp-Source: ABdhPJygwetgULcV3frqeEFOM9eAIa6ABiNFml4VwzfhfWpJj7Ts6UR0utjlnxf3ocErww64IIDi9A== X-Received: by 2002:ac8:6b59:: with SMTP id x25mr54565220qts.301.1609359312123; Wed, 30 Dec 2020 12:15:12 -0800 (PST) Received: from localhost.localdomain ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id v145sm5949057qka.27.2020.12.30.12.15.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Dec 2020 12:15:11 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Paul Eggert Subject: [PATCH 1/5] posix: Sync regex code with gnulib Date: Wed, 30 Dec 2020 17:15:03 -0300 Message-Id: <20201230201507.2755086-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::829; envelope-from=adhemerval.zanella@linaro.org; helo=mail-qt1-x829.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" It sync with gnulib commit 43ee1a6bf. The main change is 9682f18e9. (which does not have a meaniful description). Checked on x86_64-linux-gnu. --- posix/regcomp.c | 2 +- posix/regex.h | 17 ++++++++++++----- posix/regex_internal.c | 19 ++++++++++--------- posix/regex_internal.h | 16 ++++++++++++---- 4 files changed, 35 insertions(+), 19 deletions(-) diff --git a/posix/regcomp.c b/posix/regcomp.c index 93bb0a0538..692928b0db 100644 --- a/posix/regcomp.c +++ b/posix/regcomp.c @@ -558,7 +558,7 @@ weak_alias (__regerror, regerror) static const bitset_t utf8_sb_map = { /* Set the first 128 bits. */ -# if defined __GNUC__ && !defined __STRICT_ANSI__ +# if (defined __GNUC__ || __clang_major__ >= 4) && !defined __STRICT_ANSI__ [0 ... 0x80 / BITSET_WORD_BITS - 1] = BITSET_WORD_MAX # else # if 4 * BITSET_WORD_BITS < ASCII_CHARS diff --git a/posix/regex.h b/posix/regex.h index 5fe41c8685..7418e6c76f 100644 --- a/posix/regex.h +++ b/posix/regex.h @@ -612,7 +612,9 @@ extern int re_exec (const char *); 'configure' might #define 'restrict' to those words, so pick a different name. */ #ifndef _Restrict_ -# if defined __restrict || 2 < __GNUC__ + (95 <= __GNUC_MINOR__) +# if defined __restrict \ + || 2 < __GNUC__ + (95 <= __GNUC_MINOR__) \ + || __clang_major__ >= 3 # define _Restrict_ __restrict # elif 199901L <= __STDC_VERSION__ || defined restrict # define _Restrict_ restrict @@ -620,13 +622,18 @@ extern int re_exec (const char *); # define _Restrict_ # endif #endif -/* For [restrict], use glibc's __restrict_arr if available. - Otherwise, GCC 3.1 (not in C++ mode) and C99 support [restrict]. */ +/* For the ISO C99 syntax + array_name[restrict] + use glibc's __restrict_arr if available. + Otherwise, GCC 3.1 and clang support this syntax (but not in C++ mode). + Other ISO C99 compilers support it as well. */ #ifndef _Restrict_arr_ # ifdef __restrict_arr # define _Restrict_arr_ __restrict_arr -# elif ((199901L <= __STDC_VERSION__ || 3 < __GNUC__ + (1 <= __GNUC_MINOR__)) \ - && !defined __GNUG__) +# elif ((199901L <= __STDC_VERSION__ \ + || 3 < __GNUC__ + (1 <= __GNUC_MINOR__) \ + || __clang_major__ >= 3) \ + && !defined __cplusplus) # define _Restrict_arr_ _Restrict_ # else # define _Restrict_arr_ diff --git a/posix/regex_internal.c b/posix/regex_internal.c index e1b6b4d5af..ed0a13461b 100644 --- a/posix/regex_internal.c +++ b/posix/regex_internal.c @@ -300,18 +300,20 @@ build_wcs_upper_buffer (re_string_t *pstr) while (byte_idx < end_idx) { wchar_t wc; + unsigned char ch = pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]; - if (isascii (pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]) - && mbsinit (&pstr->cur_state)) + if (isascii (ch) && mbsinit (&pstr->cur_state)) { - /* In case of a singlebyte character. */ - pstr->mbs[byte_idx] - = toupper (pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]); /* The next step uses the assumption that wchar_t is encoded ASCII-safe: all ASCII values can be converted like this. */ - pstr->wcs[byte_idx] = (wchar_t) pstr->mbs[byte_idx]; - ++byte_idx; - continue; + wchar_t wcu = __towupper (ch); + if (isascii (wcu)) + { + pstr->mbs[byte_idx] = wcu; + pstr->wcs[byte_idx] = wcu; + byte_idx++; + continue; + } } remain_len = end_idx - byte_idx; @@ -348,7 +350,6 @@ build_wcs_upper_buffer (re_string_t *pstr) { /* It is an invalid character, an incomplete character at the end of the string, or '\0'. Just use the byte. */ - int ch = pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx]; pstr->mbs[byte_idx] = ch; /* And also cast it to wide char. */ pstr->wcs[byte_idx++] = (wchar_t) ch; diff --git a/posix/regex_internal.h b/posix/regex_internal.h index 8c42586c42..4a3cf779bf 100644 --- a/posix/regex_internal.h +++ b/posix/regex_internal.h @@ -77,6 +77,14 @@ # define isblank(ch) ((ch) == ' ' || (ch) == '\t') #endif +/* regex code assumes isascii has its usual numeric meaning, + even if the portable character set uses EBCDIC encoding, + and even if wint_t is wider than int. */ +#ifndef _LIBC +# undef isascii +# define isascii(c) (((c) & ~0x7f) == 0) +#endif + #ifdef _LIBC # ifndef _RE_DEFINE_LOCALE_FUNCTIONS # define _RE_DEFINE_LOCALE_FUNCTIONS 1 @@ -335,7 +343,7 @@ typedef struct Idx idx; /* for BACK_REF */ re_context_type ctx_type; /* for ANCHOR */ } opr; -#if __GNUC__ >= 2 && !defined __STRICT_ANSI__ +#if (__GNUC__ >= 2 || defined __clang__) && !defined __STRICT_ANSI__ re_token_type_t type : 8; #else re_token_type_t type; @@ -841,10 +849,10 @@ re_string_elem_size_at (const re_string_t *pstr, Idx idx) #endif /* RE_ENABLE_I18N */ #ifndef FALLTHROUGH -# if __GNUC__ < 7 -# define FALLTHROUGH ((void) 0) -# else +# if (__GNUC__ >= 7) || (__clang_major__ >= 10) # define FALLTHROUGH __attribute__ ((__fallthrough__)) +# else +# define FALLTHROUGH ((void) 0) # endif #endif -- 2.25.1