From: Carlos O'Donell <carlos@redhat.com>
To: Zack Weinberg <zackw@panix.com>, libc-alpha@sourceware.org
Subject: Re: [PATCH v4] Use a proper C tokenizer to implement the obsolete typedefs test.
Date: Mon, 11 Mar 2019 14:57:52 -0400 [thread overview]
Message-ID: <36fa3d03-0dc4-bbd7-5c99-c8fe19255b8e@redhat.com> (raw)
In-Reply-To: <20190311145927.20399-1-zackw@panix.com>
On 3/11/19 10:59 AM, Zack Weinberg wrote:
> The test for obsolete typedefs in installed headers was implemented
> using grep, and could therefore get false positives on e.g. “ulong”
> in a comment. It was also scanning all of the headers included by
> our headers, and therefore testing headers we don’t control, e.g.
> Linux kernel headers.
Correct.
> This patch splits the obsolete-typedef test from
> scripts/check-installed-headers.sh to a separate program,
> scripts/check-obsolete-constructs.py. Being implemented in Python,
> it is feasible to make it tokenize C accurately enough to avoid false
> positives on the contents of comments and strings. It also only
> examines $(headers) in each subdirectory--all the headers we install,
> but not any external dependencies of those headers. Headers whose
> installed name starts with finclude/ are ignored, on the assumption
> that they contain Fortran.
OK.
> It is also feasible to make the new test understand the difference
> between _defining_ the obsolete typedefs and _using_ the obsolete
> typedefs, which means posix/{bits,sys}/types.h no longer need to be
> exempted. This uncovered an actual bug in bits/types.h: __quad_t and
> __u_quad_t were being used to define __S64_TYPE, __U64_TYPE,
> __SQUAD_TYPE and __UQUAD_TYPE. These are changed to __int64_t and
> __uint64_t respectively. This is a safe change, despite the comments
> in bits/types.h claiming a difference between __quad_t and __int64_t,
> because those comments are incorrect. In all current ABIs, both
> __quad_t and __int64_t are ‘long’ when ‘long’ is a 64-bit type, and
> ‘long long’ when ‘long’ is a 32-bit type, and similarly for __u_quad_t
> and __uint64_t. (Changing the types to be what the comments say they
> are would be an ABI break, as it affects C++ name mangling.) This
> patch includes a minimal change to make the comments not completely
> wrong. I plan to remove __SQUAD_TYPE and __UQUAD_TYPE altogether in
> subseqent patches, but that would be inappropriate for backporting to
> release branches.
OK.
> sys/types.h was defining the legacy BSD u_intN_t typedefs using a
> construct that was not necessarily consistent with how the C99 uintN_t
> typedefs are defined, and is also too complicated for the new script to
> understand (it lexes C relatively accurately, but it does not attempt
> to expand preprocessor macros, nor does it do any actual parsing).
> This patch cuts all of that out and uses bits/types.h's __uintN_t typedefs
> to define u_intN_t instead. This is verified to not change the ABI on
> any supported architecture, via the c++-types test, which means u_intN_t
> and uintN_t were, in fact, consistent on all supported architectures.
> I plan to restrict u_intN_t and some other legacy typedefs (but not
> intN_t) to __USE_MISC in subsequent patches, but again that would be
> inappropriate for backporting to release branches.
OK.
> * scripts/check-obsolete-constructs.py: New test script.
> * scripts/check-installed-headers.sh: Remove tests for
> obsolete typedefs, superseded by check-obsolete-constructs.py.
> * Rules: Run scripts/check-obsolete-constructs.py over $(headers)
> as a special test. Update commentary.
> * posix/bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.
> (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.
> Update commentary.
> * posix/sys/types.h (__u_intN_t): Remove.
> (u_int8_t): Typedef using __uint8_t.
> (u_int16_t): Typedef using __uint16_t.
> (u_int32_t): Typedef using __uint32_t.
> (u_int64_t): Typedef using __uint64_t.
OK for master if you:
- Fix the conditional in check-installed-headers.sh.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> ---
>
> Changes since v3: Fortran headers are now detected by path
> (finclude/*) instead of looking for Emacs mode annotations within the
> file. tokenize_c is now responsible for issuing errors for the BAD_*
> and OTHER lexical productions, and no longer returns BAD_* tokens.
> It is also responsible for tracking whether or not each token belongs
> to a preprocessing directive line, and accurately tokenizes #include
> arguments. (Accurate tokenization of #include arguments will be
> required by future patches I have planned.)
OK.
> ---
> Rules | 17 +-
> posix/bits/types.h | 10 +-
> posix/sys/types.h | 33 +-
> scripts/check-installed-headers.sh | 37 +--
> scripts/check-obsolete-constructs.py | 466 +++++++++++++++++++++++++++
> 5 files changed, 500 insertions(+), 63 deletions(-)
> create mode 100755 scripts/check-obsolete-constructs.py
>
> diff --git a/Rules b/Rules
> index e08a28d9f3..222dba6dcb 100644
> --- a/Rules
> +++ b/Rules
> @@ -82,7 +82,8 @@ $(common-objpfx)dummy.c:
> common-generated += dummy.o dummy.c
>
> ifneq "$(headers)" ""
> -# Special test of all the installed headers in this directory.
> +# Test that all of the headers installed by this directory can be compiled
> +# in isolation.
OK.
> tests-special += $(objpfx)check-installed-headers-c.out
> libof-check-installed-headers-c := testsuite
> $(objpfx)check-installed-headers-c.out: \
> @@ -93,6 +94,8 @@ $(objpfx)check-installed-headers-c.out: \
> $(evaluate-test)
>
> ifneq "$(CXX)" ""
> +# If a C++ compiler is available, also test that they can be compiled
> +# in isolation as C++.
OK.
> tests-special += $(objpfx)check-installed-headers-cxx.out
> libof-check-installed-headers-cxx := testsuite
> $(objpfx)check-installed-headers-cxx.out: \
> @@ -103,12 +106,24 @@ $(objpfx)check-installed-headers-cxx.out: \
> $(evaluate-test)
> endif # $(CXX)
>
> +# Test that a wrapper header exists in include/ for each non-sysdeps header.
> +# This script does not need $(py-env).
OK.
> tests-special += $(objpfx)check-wrapper-headers.out
> $(objpfx)check-wrapper-headers.out: \
> $(..)scripts/check-wrapper-headers.py $(headers)
> $(PYTHON) $< --root=$(..) --subdir=$(subdir) $(headers) > $@; \
> $(evaluate-test)
>
> +# Test that none of the headers installed by this directory use certain
> +# obsolete constructs (e.g. legacy BSD typedefs superseded by stdint.h).
> +# This script does not need $(py-env).
> +tests-special += $(objpfx)check-obsolete-constructs.out
> +libof-check-obsolete-constructs := testsuite
> +$(objpfx)check-obsolete-constructs.out: \
> + $(..)scripts/check-obsolete-constructs.py $(headers)
> + $(PYTHON) $^ > $@ 2>&1; \
> + $(evaluate-test)
OK.
> +
> endif # $(headers)
> \f
> # This makes all the auxiliary and test programs.
> diff --git a/posix/bits/types.h b/posix/bits/types.h
> index 27e065c3be..0de6c59bb4 100644
> --- a/posix/bits/types.h
> +++ b/posix/bits/types.h
> @@ -87,7 +87,7 @@ __extension__ typedef unsigned long long int __uintmax_t;
> 32 -- "natural" 32-bit type (always int)
> 64 -- "natural" 64-bit type (long or long long)
> LONG32 -- 32-bit type, traditionally long
> - QUAD -- 64-bit type, always long long
> + QUAD -- 64-bit type, traditionally long long
OK.
> WORD -- natural type of __WORDSIZE bits (int or long)
> LONGWORD -- type of __WORDSIZE bits, traditionally long
>
> @@ -113,14 +113,14 @@ __extension__ typedef unsigned long long int __uintmax_t;
> #define __SLONGWORD_TYPE long int
> #define __ULONGWORD_TYPE unsigned long int
> #if __WORDSIZE == 32
> -# define __SQUAD_TYPE __quad_t
> -# define __UQUAD_TYPE __u_quad_t
> +# define __SQUAD_TYPE __int64_t
> +# define __UQUAD_TYPE __uint64_t
OK.
> # define __SWORD_TYPE int
> # define __UWORD_TYPE unsigned int
> # define __SLONG32_TYPE long int
> # define __ULONG32_TYPE unsigned long int
> -# define __S64_TYPE __quad_t
> -# define __U64_TYPE __u_quad_t
> +# define __S64_TYPE __int64_t
> +# define __U64_TYPE __uint64_t
OK.
> /* We want __extension__ before typedef's that use nonstandard base types
> such as `long long' in C89 mode. */
> # define __STD_TYPE __extension__ typedef
> diff --git a/posix/sys/types.h b/posix/sys/types.h
> index 27129c5c23..0e37b1ce6a 100644
> --- a/posix/sys/types.h
> +++ b/posix/sys/types.h
> @@ -154,37 +154,20 @@ typedef unsigned int uint;
>
> #include <bits/stdint-intn.h>
>
> -#if !__GNUC_PREREQ (2, 7)
> -
> /* These were defined by ISO C without the first `_'. */
> -typedef unsigned char u_int8_t;
> -typedef unsigned short int u_int16_t;
> -typedef unsigned int u_int32_t;
> -# if __WORDSIZE == 64
> -typedef unsigned long int u_int64_t;
> -# else
> -__extension__ typedef unsigned long long int u_int64_t;
> -# endif
> -
> -typedef int register_t;
OK, this removed bug covered below.
> -
> -#else
> -
> -/* For GCC 2.7 and later, we can use specific type-size attributes. */
> -# define __u_intN_t(N, MODE) \
> - typedef unsigned int u_int##N##_t __attribute__ ((__mode__ (MODE)))
> -
> -__u_intN_t (8, __QI__);
> -__u_intN_t (16, __HI__);
> -__u_intN_t (32, __SI__);
> -__u_intN_t (64, __DI__);
> +typedef __uint8_t u_int8_t;
> +typedef __uint16_t u_int16_t;
> +typedef __uint32_t u_int32_t;
> +typedef __uint64_t u_int64_t;
OK, switch to 4 new typedefs that don't change meaning.
>
> +#if __GNUC_PREREQ (2, 7)
OK, retain old check here.
> typedef int register_t __attribute__ ((__mode__ (__word__)));
> -
> +#else
> +typedef int register_t;
> +#endif
OK.
>
> /* Some code from BIND tests this macro to see if the types above are
> defined. */
> -#endif
OK.
> #define __BIT_TYPES_DEFINED__ 1
>
>
> diff --git a/scripts/check-installed-headers.sh b/scripts/check-installed-headers.sh
> index 1f4496446c..e4f37d33f8 100644
> --- a/scripts/check-installed-headers.sh
> +++ b/scripts/check-installed-headers.sh
> @@ -16,11 +16,9 @@
> # License along with the GNU C Library; if not, see
> # <http://www.gnu.org/licenses/>.
>
> -# Check installed headers for cleanliness. For each header, confirm
> -# that it's possible to compile a file that includes that header and
> -# does nothing else, in several different compilation modes. Also,
> -# scan the header for a set of obsolete typedefs that should no longer
> -# appear.
> +# For each installed header, confirm that it's possible to compile a
> +# file that includes that header and does nothing else, in several
> +# different compilation modes.
OK.
>
> # These compilation switches assume GCC or compatible, which is probably
> # fine since we also assume that when _building_ glibc.
> @@ -31,13 +29,6 @@ cxx_modes="-std=c++98 -std=gnu++98 -std=c++11 -std=gnu++11"
> # These are probably the most commonly used three.
> lib_modes="-D_DEFAULT_SOURCE=1 -D_GNU_SOURCE=1 -D_XOPEN_SOURCE=700"
>
> -# sys/types.h+bits/types.h have to define the obsolete types.
> -# rpc(svc)/* have the obsolete types too deeply embedded in their API
> -# to remove.
> -skip_obsolete_type_check='*/sys/types.h|*/bits/types.h|*/rpc/*|*/rpcsvc/*'
> -obsolete_type_re=\
> -'\<((__)?(quad_t|u(short|int|long|_(char|short|int([0-9]+_t)?|long|quad_t))))\>'
> -
OK.
> if [ $# -lt 3 ]; then
> echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
> exit 2
> @@ -46,14 +37,10 @@ case "$1" in
> (c)
> lang_modes="$c_modes"
> cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.c)
> - already="$skip_obsolete_type_check"
OK.
> ;;
> (c++)
> lang_modes="$cxx_modes"
> cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.cc)
> - # The obsolete-type check can be skipped for C++; it is
> - # sufficient to do it for C.
> - already="*"
OK.
> ;;
> (*)
> echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
> @@ -155,22 +142,8 @@ $expanded_lib_mode
> int avoid_empty_translation_unit;
> EOF
> if $cc_cmd -fsyntax-only $lang_mode "$cih_test_c" 2>&1
> - then
> - includes=$($cc_cmd -fsyntax-only -H $lang_mode \
> - "$cih_test_c" 2>&1 | sed -ne 's/^[.][.]* //p')
> - for h in $includes; do
> - # Don't repeat work.
> - eval 'case "$h" in ('"$already"') continue;; esac'
> -
> - if grep -qE "$obsolete_type_re" "$h"; then
> - echo "*** Obsolete types detected:"
> - grep -HE "$obsolete_type_re" "$h"
> - failed=1
> - fi
> - already="$already|$h"
> - done
> - else
> - failed=1
> + then :
> + else failed=1
Why not 'if ! $cc_cmd ...' ? Which avoids the odd empty if block e.g. ":".
> fi
> done
> done
> diff --git a/scripts/check-obsolete-constructs.py b/scripts/check-obsolete-constructs.py
> new file mode 100755
> index 0000000000..46535afcac
> --- /dev/null
> +++ b/scripts/check-obsolete-constructs.py
> @@ -0,1 +1,466 @@
> +#! /usr/bin/python3
OK.
> +# Copyright (C) 2019 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +"""Verifies that installed headers do not use any obsolete constructs:
> + * legacy BSD typedefs superseded by <stdint.h>:
> + ushort uint ulong u_char u_short u_int u_long u_intNN_t quad_t u_quad_t
> + (sys/types.h is allowed to _define_ these types, but not to use them
> + to define anything else).
> +"""
> +
> +import argparse
> +import collections
> +import re
> +import sys
> +
> +# Simplified lexical analyzer for C preprocessing tokens.
> +# Does not implement trigraphs.
> +# Does not implement backslash-newline in the middle of any lexical
> +# item other than a string literal.
> +# Does not implement universal-character-names in identifiers.
> +# Treats prefixed strings (e.g. L"...") as two tokens (L and "...")
> +# Accepts non-ASCII characters only within comments and strings.
> +
> +# Caution: The order of the outermost alternation matters.
> +# STRING must be before BAD_STRING, CHARCONST before BAD_CHARCONST,
> +# BLOCK_COMMENT before BAD_BLOCK_COM before PUNCTUATOR, and OTHER must
> +# be last.
> +# Caution: There should be no capturing groups other than the named
> +# captures in the outermost alternation.
> +
> +# For reference, these are all of the C punctuators as of C11:
> +# [ ] ( ) { } , ; ? ~
> +# ! != * *= / /= ^ ^= = ==
> +# # ##
> +# % %= %> %: %:%:
> +# & &= &&
> +# | |= ||
> +# + += ++
> +# - -= -- ->
> +# . ...
> +# : :>
> +# < <% <: << <<= <=
> +# > >= >> >>=
> +
> +# The BAD_* tokens are not part of the official definition of pp-tokens;
> +# they match unclosed strings, character constants, and block comments,
> +# so that the regex engine doesn't have to backtrack all the way to the
> +# beginning of a broken construct and then emit dozens of junk tokens.
> +
> +PP_TOKEN_RE_ = re.compile(r"""
> + (?P<STRING> \"(?:[^\"\\\r\n]|\\(?:[\r\n -~]|\r\n))*\")
> + |(?P<BAD_STRING> \"(?:[^\"\\\r\n]|\\[ -~])*)
> + |(?P<CHARCONST> \'(?:[^\'\\\r\n]|\\(?:[\r\n -~]|\r\n))*\')
> + |(?P<BAD_CHARCONST> \'(?:[^\'\\\r\n]|\\[ -~])*)
> + |(?P<BLOCK_COMMENT> /\*(?:\*(?!/)|[^*])*\*/)
> + |(?P<BAD_BLOCK_COM> /\*(?:\*(?!/)|[^*])*\*?)
> + |(?P<LINE_COMMENT> //[^\r\n]*)
> + |(?P<IDENT> [_a-zA-Z][_a-zA-Z0-9]*)
> + |(?P<PP_NUMBER> \.?[0-9](?:[0-9a-df-oq-zA-DF-OQ-Z_.]|[eEpP][+-]?)*)
> + |(?P<PUNCTUATOR>
> + [,;?~(){}\[\]]
> + | [!*/^=]=?
> + | \#\#?
> + | %(?:[=>]|:(?:%:)?)?
> + | &[=&]?
> + |\|[=|]?
> + |\+[=+]?
> + | -[=->]?
> + |\.(?:\.\.)?
> + | :>?
> + | <(?:[%:]|<(?:=|<=?)?)?
> + | >(?:=|>=?)?)
> + |(?P<ESCNL> \\(?:\r|\n|\r\n))
> + |(?P<WHITESPACE> [ \t\n\r\v\f]+)
> + |(?P<OTHER> .)
> +""", re.DOTALL | re.VERBOSE)
> +
> +HEADER_NAME_RE_ = re.compile(r"""
> + < [^>\r\n]+ >
> + | " [^"\r\n]+ "
> +""", re.DOTALL | re.VERBOSE)
> +
> +ENDLINE_RE_ = re.compile(r"""\r|\n|\r\n""")
> +
> +# based on the sample code in the Python re documentation
> +Token_ = collections.namedtuple("Token", (
> + "kind", "text", "line", "column", "context"))
> +Token_.__doc__ = """
> + One C preprocessing token, comment, or chunk of whitespace.
> + 'kind' identifies the token type, which will be one of:
> + STRING, CHARCONST, BLOCK_COMMENT, LINE_COMMENT, IDENT,
> + PP_NUMBER, PUNCTUATOR, ESCNL, WHITESPACE, HEADER_NAME,
> + or OTHER. The BAD_* alternatives in PP_TOKEN_RE_ are
> + handled within tokenize_c, below.
> +
> + 'text' is the sequence of source characters making up the token;
> + no decoding whatsoever is performed.
> +
> + 'line' and 'column' give the position of the first character of the
> + token within the source file. They are both 1-based.
> +
> + 'context' indicates whether or not this token occurred within a
> + preprocessing directive; it will be None for running text,
> + '<null>' for the leading '#' of a directive line (because '#'
> + all by itself on a line is a "null directive"), or the name of
> + the directive for tokens within a directive line, starting with
> + the IDENT for the name itself.
> +"""
> +
> +def tokenize_c(file_contents, reporter):
> + """Yield a series of Token objects, one for each preprocessing
> + token, comment, or chunk of whitespace within FILE_CONTENTS.
> + The REPORTER object is expected to have one method,
> + reporter.error(token, message), which will be called to
> + indicate a lexical error at the position of TOKEN.
> + If MESSAGE contains the four-character sequence '{!r}', that
> + is expected to be replaced by repr(token.text).
> + """
> +
> + Token = Token_
> + PP_TOKEN_RE = PP_TOKEN_RE_
> + ENDLINE_RE = ENDLINE_RE_
> + HEADER_NAME_RE = HEADER_NAME_RE_
> +
> + line_num = 1
> + line_start = 0
> + pos = 0
> + limit = len(file_contents)
> + directive = None
> + at_bol = True
> + while pos < limit:
> + if directive == "include":
> + mo = HEADER_NAME_RE.match(file_contents, pos)
> + if mo:
> + kind = "HEADER_NAME"
> + directive = "after_include"
> + else:
> + mo = PP_TOKEN_RE.match(file_contents, pos)
> + kind = mo.lastgroup
> + if kind != "WHITESPACE":
> + directive = "after_include"
> + else:
> + mo = PP_TOKEN_RE.match(file_contents, pos)
> + kind = mo.lastgroup
> +
> + text = mo.group()
> + line = line_num
> + column = mo.start() - line_start
> + adj_line_start = 0
> + # only these kinds can contain a newline
> + if kind in ("WHITESPACE", "BLOCK_COMMENT", "LINE_COMMENT",
> + "STRING", "CHARCONST", "BAD_BLOCK_COM", "ESCNL"):
> + for tmo in ENDLINE_RE.finditer(text):
> + line_num += 1
> + adj_line_start = tmo.end()
> + if adj_line_start:
> + line_start = mo.start() + adj_line_start
> +
> + # Track whether or not we are scanning a preprocessing directive.
> + if kind == "LINE_COMMENT" or (kind == "WHITESPACE" and adj_line_start):
> + at_bol = True
> + directive = None
> + else:
> + if kind == "PUNCTUATOR" and text == "#" and at_bol:
> + directive = "<null>"
> + elif kind == "IDENT" and directive == "<null>":
> + directive = text
> + at_bol = False
> +
> + # Report ill-formed tokens and rewrite them as their well-formed
> + # equivalents, so downstream processing doesn't have to know about them.
> + # (Rewriting instead of discarding provides better error recovery.)
> + if kind == "BAD_BLOCK_COM":
> + reporter.error(Token("BAD_BLOCK_COM", "", line, column+1, ""),
> + "unclosed block comment")
> + text += "*/"
> + kind = "BLOCK_COMMENT"
> + elif kind == "BAD_STRING":
> + reporter.error(Token("BAD_STRING", "", line, column+1, ""),
> + "unclosed string")
> + text += "\""
> + kind = "STRING"
> + elif kind == "BAD_CHARCONST":
> + reporter.error(Token("BAD_CHARCONST", "", line, column+1, ""),
> + "unclosed char constant")
> + text += "'"
> + kind = "CHARCONST"
> +
> + tok = Token(kind, text, line, column+1,
> + "include" if directive == "after_include" else directive)
> + # Do not complain about OTHER tokens inside macro definitions.
> + # $ and @ appear in macros defined by headers intended to be
> + # included from assembly language, e.g. sysdeps/mips/sys/asm.h.
> + if kind == "OTHER" and directive != "define":
> + self.error(tok, "stray {!r} in program")
> +
> + yield tok
> + pos = mo.end()
> +
> +#
> +# Base and generic classes for individual checks.
> +#
> +
> +class ConstructChecker:
> + """Scan a stream of C preprocessing tokens and possibly report
> + problems with them. The REPORTER object passed to __init__ has
> + one method, reporter.error(token, message), which should be
> + called to indicate a problem detected at the position of TOKEN.
> + If MESSAGE contains the four-character sequence '{!r}' then that
> + will be replaced with a textual representation of TOKEN.
> + """
> + def __init__(self, reporter):
> + self.reporter = reporter
> +
> + def examine(self, tok):
> + """Called once for each token in a header file.
> + Call self.reporter.error if a problem is detected.
> + """
> + raise NotImplementedError
> +
> + def eof(self):
> + """Called once at the end of the stream. Subclasses need only
> + override this if it might have something to do."""
> + pass
> +
> +class NoCheck(ConstructChecker):
> + """Generic checker class which doesn't do anything. Substitute this
> + class for a real checker when a particular check should be skipped
> + for some file."""
> +
> + def examine(self, tok):
> + pass
> +
> +#
> +# Check for obsolete type names.
> +#
> +
> +# The obsolete type names we're looking for:
> +OBSOLETE_TYPE_RE_ = re.compile(r"""\A
> + (__)?
> + ( quad_t
> + | u(?: short | int | long
> + | _(?: char | short | int(?:[0-9]+_t)? | long | quad_t )))
> +\Z""", re.VERBOSE)
> +
> +class ObsoleteNotAllowed(ConstructChecker):
> + """Don't allow any use of the obsolete typedefs."""
> + def examine(self, tok):
> + if OBSOLETE_TYPE_RE_.match(tok.text):
> + self.reporter.error(tok, "use of {!r}")
> +
> +class ObsoletePrivateDefinitionsAllowed(ConstructChecker):
> + """Allow definitions of the private versions of the
> + obsolete typedefs; that is, 'typedef [anything] __obsolete;'
> + """
> + def __init__(self, reporter):
> + super().__init__(reporter)
> + self.in_typedef = False
> + self.prev_token = None
> +
> + def examine(self, tok):
> + # bits/types.h hides 'typedef' in a macro sometimes.
> + if (tok.kind == "IDENT"
> + and tok.text in ("typedef", "__STD_TYPE")
> + and tok.context is None):
> + self.in_typedef = True
> + elif tok.kind == "PUNCTUATOR" and tok.text == ";" and self.in_typedef:
> + self.in_typedef = False
> + if self.prev_token.kind == "IDENT":
> + m = OBSOLETE_TYPE_RE_.match(self.prev_token.text)
> + if m and m.group(1) != "__":
> + self.reporter.error(self.prev_token, "use of {!r}")
> + self.prev_token = None
> + else:
> + self._check_prev()
> +
> + self.prev_token = tok
> +
> + def eof(self):
> + self._check_prev()
> +
> + def _check_prev(self):
> + if (self.prev_token is not None
> + and self.prev_token.kind == "IDENT"
> + and OBSOLETE_TYPE_RE_.match(self.prev_token.text)):
> + self.reporter.error(self.prev_token, "use of {!r}")
> +
> +class ObsoletePublicDefinitionsAllowed(ConstructChecker):
> + """Allow definitions of the public versions of the obsolete
> + typedefs. Only specific forms of definition are allowed:
> +
> + typedef __obsolete obsolete; // identifiers must agree
> + typedef __uintN_t u_intN_t; // N must agree
> + typedef unsigned long int ulong;
> + typedef unsigned short int ushort;
> + typedef unsigned int uint;
> + """
> + def __init__(self, reporter):
> + super().__init__(reporter)
> + self.typedef_tokens = []
> +
> + def examine(self, tok):
> + if tok.kind in ("WHITESPACE", "BLOCK_COMMENT",
> + "LINE_COMMENT", "NL", "ESCNL"):
> + pass
> +
> + elif (tok.kind == "IDENT" and tok.text == "typedef"
> + and tok.context is None):
> + if self.typedef_tokens:
> + self.reporter.error(tok, "typedef inside typedef")
> + self._reset()
> + self.typedef_tokens.append(tok)
> +
> + elif tok.kind == "PUNCTUATOR" and tok.text == ";":
> + self._finish()
> +
> + elif self.typedef_tokens:
> + self.typedef_tokens.append(tok)
> +
> + def eof(self):
> + self._reset()
> +
> + def _reset(self):
> + while self.typedef_tokens:
> + tok = self.typedef_tokens.pop(0)
> + if tok.kind == "IDENT" and OBSOLETE_TYPE_RE_.match(tok.text):
> + self.reporter.error(tok, "use of {!r}")
> +
> + def _finish(self):
> + if not self.typedef_tokens: return
> + if self.typedef_tokens[-1].kind == "IDENT":
> + m = OBSOLETE_TYPE_RE_.match(self.typedef_tokens[-1].text)
> + if m:
> + if self._permissible_public_definition(m):
> + self.typedef_tokens.clear()
> + self._reset()
> +
> + def _permissible_public_definition(self, m):
> + if m.group(1) == "__": return False
> + name = m.group(2)
> + toks = self.typedef_tokens
> + ntok = len(toks)
> + if ntok == 3 and toks[1].kind == "IDENT":
> + defn = toks[1].text
> + n = OBSOLETE_TYPE_RE_.match(defn)
> + if n and n.group(1) == "__" and n.group(2) == name:
> + return True
> +
> + if (name[:5] == "u_int" and name[-2:] == "_t"
> + and defn[:6] == "__uint" and defn[-2:] == "_t"
> + and name[5:-2] == defn[6:-2]):
> + return True
> +
> + return False
> +
> + if (name == "ulong" and ntok == 5
> + and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
> + and toks[2].kind == "IDENT" and toks[2].text == "long"
> + and toks[3].kind == "IDENT" and toks[3].text == "int"):
> + return True
> +
> + if (name == "ushort" and ntok == 5
> + and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
> + and toks[2].kind == "IDENT" and toks[2].text == "short"
> + and toks[3].kind == "IDENT" and toks[3].text == "int"):
> + return True
> +
> + if (name == "uint" and ntok == 4
> + and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
> + and toks[2].kind == "IDENT" and toks[2].text == "int"):
> + return True
> +
> + return False
> +
> +def ObsoleteTypedefChecker(reporter, fname):
> + """Factory: produce an instance of the appropriate
> + obsolete-typedef checker for FNAME."""
> +
> + # The obsolete rpc/ and rpcsvc/ headers are allowed to use the
> + # obsolete types, because it would be more trouble than it's
> + # worth to remove them from headers that we intend to stop
> + # installing eventually anyway.
> + if (fname.startswith("rpc/")
> + or fname.startswith("rpcsvc/")
> + or "/rpc/" in fname
> + or "/rpcsvc/" in fname):
> + return NoCheck(reporter)
> +
> + # bits/types.h is allowed to define the __-versions of the
> + # obsolete types.
> + if (fname == "bits/types.h"
> + or fname.endswith("/bits/types.h")):
> + return ObsoletePrivateDefinitionsAllowed(reporter)
> +
> + # sys/types.h is allowed to use the __-versions of the
> + # obsolete types, but only to define the unprefixed versions.
> + if (fname == "sys/types.h"
> + or fname.endswith("/sys/types.h")):
> + return ObsoletePublicDefinitionsAllowed(reporter)
> +
> + return ObsoleteNotAllowed(reporter)
> +
> +#
> +# Master control
> +#
> +
> +class HeaderChecker:
> + """Perform all of the checks on each header. This is also the
> + "reporter" object expected by tokenize_c and ConstructChecker.
> + """
> + def __init__(self):
> + self.fname = None
> + self.status = 0
> +
> + def error(self, tok, message):
> + self.status = 1
> + if '{!r}' in message:
> + message = message.format(tok.text)
> + sys.stderr.write("{}:{}:{}: error: {}\n".format(
> + self.fname, tok.line, tok.column, message))
> +
> + def check(self, fname):
> + self.fname = fname
> + try:
> + with open(fname, "rt") as fp:
> + contents = fp.read()
> + except OSError as e:
> + sys.stderr.write("{}: {}\n".format(fname, e.strerror))
> + self.status = 1
> + return
> +
> + typedef_checker = ObsoleteTypedefChecker(self, self.fname)
> +
> + for tok in tokenize_c(contents, self):
> + typedef_checker.examine(tok)
> +
> +def main():
> + ap = argparse.ArgumentParser(description=__doc__)
> + ap.add_argument("headers", metavar="header", nargs="+",
> + help="one or more headers to scan for obsolete constructs")
> + args = ap.parse_args()
> +
> + checker = HeaderChecker()
> + for fname in args.headers:
> + # Headers whose installed name begins with "finclude/" contain
> + # Fortran, not C, and this program should completely ignore them.
> + if not (fname.startswith("finclude/") or "/finclude/" in fname):
> + checker.check(fname)
> + sys.exit(checker.status)
> +
> +main()
>
OK.
--
Cheers,
Carlos.
next prev parent reply other threads:[~2019-03-11 18:58 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-11 14:59 [PATCH v4] Use a proper C tokenizer to implement the obsolete typedefs test Zack Weinberg
2019-03-11 18:57 ` Carlos O'Donell [this message]
2019-03-12 0:59 ` Zack Weinberg
2019-03-12 3:47 ` Carlos O'Donell
2019-03-13 13:47 ` Zack Weinberg
2019-03-13 22:16 ` Joseph Myers
2019-03-14 13:00 ` Carlos O'Donell
2019-03-14 13:21 ` Zack Weinberg
2019-03-14 18:06 ` Joseph Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=36fa3d03-0dc4-bbd7-5c99-c8fe19255b8e@redhat.com \
--to=carlos@redhat.com \
--cc=libc-alpha@sourceware.org \
--cc=zackw@panix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).