[PATCH 0/7] Update the compat/regex engine from upstream

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/7] Update the compat/regex engine from upstream
@ 2017-05-04 22:00 Ævar Arnfjörð Bjarmason
  2017-05-04 22:00 ` [PATCH 1/7] compat/regex: add a README with a maintenance guide Ævar Arnfjörð Bjarmason
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-04 22:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins,
	Ævar Arnfjörð Bjarmason

See the first patch for motivation & why.

The only reason this has a cover letter is to explain the !fixup
commits. IIRC the mailing list has a 100K limit, which this series
would violate, so I split up the second commit.

Consider all these !fixup commits to have by Signed-off-by, easier to
say that here than to modify them all.

Ævar Arnfjörð Bjarmason (7):
  compat/regex: add a README with a maintenance guide
  compat/regex: update the gawk regex engine from upstream
  fixup! compat/regex: update the gawk regex engine from upstream
  fixup! compat/regex: update the gawk regex engine from upstream
  fixup! compat/regex: update the gawk regex engine from upstream
  fixup! compat/regex: update the gawk regex engine from upstream
  fixup! compat/regex: update the gawk regex engine from upstream

 Makefile                                           |   8 +-
 compat/regex/README                                |  21 +
 compat/regex/intprops.h                            | 448 +++++++++++++++++++++
 .../0001-Add-notice-at-top-of-copied-files.patch   | 120 ++++++
 .../0002-Remove-verify.h-use-from-intprops.h.patch |  41 ++
 compat/regex/regcomp.c                             | 356 +++++++++-------
 compat/regex/regex.c                               |  32 +-
 compat/regex/regex.h                               | 120 +++---
 compat/regex/regex_internal.c                      | 118 +++---
 compat/regex/regex_internal.h                      | 118 +++---
 compat/regex/regexec.c                             | 242 +++++------
 compat/regex/verify.h                              | 286 +++++++++++++
 12 files changed, 1487 insertions(+), 423 deletions(-)
 create mode 100644 compat/regex/README
 create mode 100644 compat/regex/intprops.h
 create mode 100644 compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch
 create mode 100644 compat/regex/patches/0002-Remove-verify.h-use-from-intprops.h.patch
 create mode 100644 compat/regex/verify.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/7] compat/regex: add a README with a maintenance guide
  2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
@ 2017-05-04 22:00 ` Ævar Arnfjörð Bjarmason
  2017-05-12  0:47   ` Junio C Hamano
  2017-05-04 22:00 ` [PATCH 2/7] compat/regex: update the gawk regex engine from upstream Ævar Arnfjörð Bjarmason
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-04 22:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins,
	Ævar Arnfjörð Bjarmason

Add a README file to compat/regex describing how the copy of the gawk
engine should be maintained.

Since gawk's regex engine was originally imported in git.git in commit
d18f76dccf ("compat/regex: use the regex engine from gawk for compat",
2010-08-17) the Git project has forked the upstream code.

Most of the changes that have been made in that time have been made
redundant by similar changes made upstream. Out of all the
modifications made to it since then, which can be found via:

    $ git log --oneline d18f76dccf..v2.13.0-rc2 -- compat/regex/

These are the only real code changes that aren't made fully redundant
by upstream patches:

    ce518bbd6c ("Fix compat/regex ANSIfication on MinGW", 2010-08-26)
    5b62e6374a ("compat/regex/regexec.c: Fix some sparse warnings", 2013-04-27)
    d099b7173d ("Fix some sparse warnings", 2013-07-18)

These look to me like they might be a non-issue due to subsequent
changes, or perhaps aren't needed anymore due to compiler updates.

In addition a few style & typo changes have been made in that time:

    ce9171cd63 ("compat/regex: fix spelling and grammar in comments", 2013-04-12)
    749f763dbb ("typofix: in-code comments", 2013-07-22)
    c01499ef69 ("C: have space around && and || operators", 2013-10-16)

Some of these could still be applied, but I don't see any point in
doing so. These are typo & style nits, if anyone really cares that
much they should send updates to gawk.git instead of making the
re-merging of code into git.git harder over such trivial issues.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 compat/regex/README | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
 create mode 100644 compat/regex/README

diff --git a/compat/regex/README b/compat/regex/README
new file mode 100644
index 0000000000..345d322d8c
--- /dev/null
+++ b/compat/regex/README
@@ -0,0 +1,21 @@
+This is the Git project's copy of the GNU awk (Gawk) regex
+engine. It's used when Git is build with e.g. NO_REGEX=NeedsStartEnd,
+or when the C library's regular expression functions are otherwise
+deficient.
+
+This is not a fork, but a source code copy. Upstream is the Gawk
+project, and the sources should be periodically updated from their
+copy, which can be done with:
+
+    for f in $(find . -name '*.[ch]' -printf "%f\n"); do wget http://git.savannah.gnu.org/cgit/gawk.git/plain/support/$f -O $f; done
+
+For ease of maintenance, and to intentionally make it inconvenient to
+diverge from upstream (since it makes it harder to re-merge) any local
+changes should be stored in the patches/ directory, which after doing
+the above can be applied as:
+
+    for p in patches/*; do patch -p3 < $p; done
+
+For any changes that aren't specific to the git.git copy please submit
+a patch to the Gawk project and/or to the GNU C library (the Gawk
+regex engine is a periodically & forked copy from glibc.git).
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/7] compat/regex: update the gawk regex engine from upstream
  2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
  2017-05-04 22:00 ` [PATCH 1/7] compat/regex: add a README with a maintenance guide Ævar Arnfjörð Bjarmason
@ 2017-05-04 22:00 ` Ævar Arnfjörð Bjarmason
  2017-05-04 22:00 ` [PATCH 3/7] fixup! " Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-04 22:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins,
	Ævar Arnfjörð Bjarmason

Update the gawk regex engine from the upstream gawk.git as detailed in
the README added in a previous change.

This is from gawk.git's gawk-4.1.0-2558-gb2651a80 which is the same
code as in the stable gawk-4.1.4 release, but with one trivial change
on top added in commit 725d2f78 ("Add small regex fix. Add support
directory.", 2016-12-22)[1]

The two patches applied on top of the upstream engine are to,
respectively:

 * Add a notice at the top of each file saying that this copy is
   maintained by the Git project.f

 * Remove the dependency on gawk's verify.h. The library compiles
   as-is when this header file is present, but unfortunately it's
   under GPL v3, unlike the rest of the files which is under LGPL 2.1
   or later.

The changes made in commit a997bf423d ("compat/regex: get the gawk
regex engine to compile within git", 2010-08-17) turned out to be
redundant to achieving the same with defining a few flags to make the
code itself do similar things.

In addition the -DNO_MBSUPPORT flag is not needed, upstream removed
the code that relied on that. It's possible that either -DHAVE_BTOWC
or -D_GNU_SOURCE could cause some problems on non-GNU systems.

The -DHAVE_BTOWC flag indicates that wchar.h has a btowc(3). This
function is defined in POSIX.1-2001 & C99 and later.

The -D_GNU_SOURCE flag is needed because the library itself does:

    #ifndef _LIBC
    #define __USE_GNU	1
    #endif

Which is subsequently picked up by GNU C library headers:

    In file included from compat/regex/regex_internal.h:32:0,
                     from compat/regex/regex.c:76:
    /usr/include/stdio.h:316:6: error: unknown type name ‘_IO_cookie_io_functions_t’; did you mean ‘__fortify_function’?
          _IO_cookie_io_functions_t __io_funcs) __THROW __wur;
          ^~~~~~~~~~~~~~~~~~~~~~~~~

1. http://git.savannah.gnu.org/cgit/gawk.git/commit/?id=725d2f78

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile                                           |   8 +-
 .../0001-Add-notice-at-top-of-copied-files.patch   | 120 +++++++++++++++++++++
 .../0002-Remove-verify.h-use-from-intprops.h.patch |  41 +++++++
 3 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch
 create mode 100644 compat/regex/patches/0002-Remove-verify.h-use-from-intprops.h.patch

diff --git a/Makefile b/Makefile
index e35542e631..6235e1b954 100644
--- a/Makefile
+++ b/Makefile
@@ -2060,7 +2060,13 @@ endif
 
 ifdef NO_REGEX
 compat/regex/regex.sp compat/regex/regex.o: EXTRA_CPPFLAGS = \
-	-DGAWK -DNO_MBSUPPORT
+	-DGAWK \
+	-DHAVE_WCHAR_H \
+	-DHAVE_WCTYPE_H \
+	-DHAVE_STDDEF_H \
+	-DHAVE_STDBOOL_H \
+	-DHAVE_BTOWC \
+	-D_GNU_SOURCE
 endif
 
 ifdef USE_NED_ALLOCATOR
diff --git a/compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch b/compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch
new file mode 100644
index 0000000000..4b4acc45ba
--- /dev/null
+++ b/compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch
@@ -0,0 +1,120 @@
+diff --git a/compat/regex/intprops.h b/compat/regex/intprops.h
+index 716741adc5..2aef98d290 100644
+--- a/compat/regex/intprops.h
++++ b/compat/regex/intprops.h
+@@ -1,3 +1,10 @@
++/*
++ * This is git.git's copy of gawk.git's regex engine. Please see that
++ * project for the latest version & to submit patches to this code,
++ * and git.git's compat/regex/README for information on how git's copy
++ * of this code is maintained.
++ */
++
+ /* intprops.h -- properties of integer types
+ 
+    Copyright (C) 2001-2016 Free Software Foundation, Inc.
+diff --git a/compat/regex/regcomp.c b/compat/regex/regcomp.c
+index 5ac5370142..a1fb2e400e 100644
+--- a/compat/regex/regcomp.c
++++ b/compat/regex/regcomp.c
+@@ -1,3 +1,10 @@
++/*
++ * This is git.git's copy of gawk.git's regex engine. Please see that
++ * project for the latest version & to submit patches to this code,
++ * and git.git's compat/regex/README for information on how git's copy
++ * of this code is maintained.
++ */
++
+ /* Extended regular expression matching and search library.
+    Copyright (C) 2002-2016 Free Software Foundation, Inc.
+    This file is part of the GNU C Library.
+diff --git a/compat/regex/regex.c b/compat/regex/regex.c
+index 9f133fab84..d6e525e567 100644
+--- a/compat/regex/regex.c
++++ b/compat/regex/regex.c
+@@ -1,3 +1,10 @@
++/*
++ * This is git.git's copy of gawk.git's regex engine. Please see that
++ * project for the latest version & to submit patches to this code,
++ * and git.git's compat/regex/README for information on how git's copy
++ * of this code is maintained.
++ */
++
+ /* Extended regular expression matching and search library.
+    Copyright (C) 2002-2016 Free Software Foundation, Inc.
+    This file is part of the GNU C Library.
+diff --git a/compat/regex/regex.h b/compat/regex/regex.h
+index 143b3afa89..b602b5567f 100644
+--- a/compat/regex/regex.h
++++ b/compat/regex/regex.h
+@@ -1,3 +1,10 @@
++/*
++ * This is git.git's copy of gawk.git's regex engine. Please see that
++ * project for the latest version & to submit patches to this code,
++ * and git.git's compat/regex/README for information on how git's copy
++ * of this code is maintained.
++ */
++
+ /* Definitions for data structures and routines for the regular
+    expression library.
+    Copyright (C) 1985, 1989-2016 Free Software Foundation, Inc.
+diff --git a/compat/regex/regex_internal.c b/compat/regex/regex_internal.c
+index 18641ef1c0..6d766114a1 100644
+--- a/compat/regex/regex_internal.c
++++ b/compat/regex/regex_internal.c
+@@ -1,3 +1,10 @@
++/*
++ * This is git.git's copy of gawk.git's regex engine. Please see that
++ * project for the latest version & to submit patches to this code,
++ * and git.git's compat/regex/README for information on how git's copy
++ * of this code is maintained.
++ */
++
+ /* Extended regular expression matching and search library.
+    Copyright (C) 2002-2016 Free Software Foundation, Inc.
+    This file is part of the GNU C Library.
+diff --git a/compat/regex/regex_internal.h b/compat/regex/regex_internal.h
+index 01465e7678..9c88a6a57b 100644
+--- a/compat/regex/regex_internal.h
++++ b/compat/regex/regex_internal.h
+@@ -1,3 +1,10 @@
++/*
++ * This is git.git's copy of gawk.git's regex engine. Please see that
++ * project for the latest version & to submit patches to this code,
++ * and git.git's compat/regex/README for information on how git's copy
++ * of this code is maintained.
++ */
++
+ /* Extended regular expression matching and search library.
+    Copyright (C) 2002-2016 Free Software Foundation, Inc.
+    This file is part of the GNU C Library.
+diff --git a/compat/regex/regexec.c b/compat/regex/regexec.c
+index c8f11e52e7..c79ff38b1c 100644
+--- a/compat/regex/regexec.c
++++ b/compat/regex/regexec.c
+@@ -1,3 +1,10 @@
++/*
++ * This is git.git's copy of gawk.git's regex engine. Please see that
++ * project for the latest version & to submit patches to this code,
++ * and git.git's compat/regex/README for information on how git's copy
++ * of this code is maintained.
++ */
++
+ /* Extended regular expression matching and search library.
+    Copyright (C) 2002-2016 Free Software Foundation, Inc.
+    This file is part of the GNU C Library.
+diff --git a/compat/regex/verify.h b/compat/regex/verify.h
+index 5c8381d290..e865af5298 100644
+--- a/compat/regex/verify.h
++++ b/compat/regex/verify.h
+@@ -1,3 +1,10 @@
++/*
++ * This is git.git's copy of gawk.git's regex engine. Please see that
++ * project for the latest version & to submit patches to this code,
++ * and git.git's compat/regex/README for information on how git's copy
++ * of this code is maintained.
++ */
++
+ /* Compile-time assert-like macros.
+ 
+    Copyright (C) 2005-2006, 2009-2016 Free Software Foundation, Inc.
diff --git a/compat/regex/patches/0002-Remove-verify.h-use-from-intprops.h.patch b/compat/regex/patches/0002-Remove-verify.h-use-from-intprops.h.patch
new file mode 100644
index 0000000000..16c3fd30dd
--- /dev/null
+++ b/compat/regex/patches/0002-Remove-verify.h-use-from-intprops.h.patch
@@ -0,0 +1,41 @@
+diff --git a/compat/regex/intprops.h b/compat/regex/intprops.h
+index 2aef98d290..29f7f40837 100644
+--- a/compat/regex/intprops.h
++++ b/compat/regex/intprops.h
+@@ -28,7 +28,6 @@
+ #define _GL_INTPROPS_H
+ 
+ #include <limits.h>
+-#include <verify.h>
+ 
+ #ifndef __has_builtin
+ # define __has_builtin(x) 0
+@@ -88,28 +87,6 @@
+ # define LLONG_MIN __INT64_MIN
+ #endif
+ 
+-/* This include file assumes that signed types are two's complement without
+-   padding bits; the above macros have undefined behavior otherwise.
+-   If this is a problem for you, please let us know how to fix it for your host.
+-   As a sanity check, test the assumption for some signed types that
+-   <limits.h> bounds.  */
+-verify (TYPE_MINIMUM (signed char) == SCHAR_MIN);
+-verify (TYPE_MAXIMUM (signed char) == SCHAR_MAX);
+-verify (TYPE_MINIMUM (short int) == SHRT_MIN);
+-verify (TYPE_MAXIMUM (short int) == SHRT_MAX);
+-verify (TYPE_MINIMUM (int) == INT_MIN);
+-verify (TYPE_MAXIMUM (int) == INT_MAX);
+-verify (TYPE_MINIMUM (long int) == LONG_MIN);
+-verify (TYPE_MAXIMUM (long int) == LONG_MAX);
+-#ifdef LLONG_MAX
+-verify (TYPE_MINIMUM (long long int) == LLONG_MIN);
+-verify (TYPE_MAXIMUM (long long int) == LLONG_MAX);
+-#endif
+-/* Similarly, sanity-check one ISO/IEC TS 18661-1:2014 macro if defined.  */
+-#ifdef UINT_WIDTH
+-verify (TYPE_WIDTH (unsigned int) == UINT_WIDTH);
+-#endif
+-
+ /* Does the __typeof__ keyword work?  This could be done by
+    'configure', but for now it's easier to do it by hand.  */
+ #if (2 <= __GNUC__ \
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/7] fixup! compat/regex: update the gawk regex engine from upstream
  2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
  2017-05-04 22:00 ` [PATCH 1/7] compat/regex: add a README with a maintenance guide Ævar Arnfjörð Bjarmason
  2017-05-04 22:00 ` [PATCH 2/7] compat/regex: update the gawk regex engine from upstream Ævar Arnfjörð Bjarmason
@ 2017-05-04 22:00 ` Ævar Arnfjörð Bjarmason
  2017-05-05  5:54   ` Johannes Sixt
  2017-05-04 22:00 ` [PATCH 4/7] " Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-04 22:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins,
	Ævar Arnfjörð Bjarmason

---
 compat/regex/regcomp.c | 356 +++++++++++++++++++++++++++++--------------------
 1 file changed, 209 insertions(+), 147 deletions(-)

diff --git a/compat/regex/regcomp.c b/compat/regex/regcomp.c
index d8bde06f1a..a1fb2e400e 100644
--- a/compat/regex/regcomp.c
+++ b/compat/regex/regcomp.c
@@ -1,5 +1,12 @@
+/*
+ * This is git.git's copy of gawk.git's regex engine. Please see that
+ * project for the latest version & to submit patches to this code,
+ * and git.git's compat/regex/README for information on how git's copy
+ * of this code is maintained.
+ */
+
 /* Extended regular expression matching and search library.
-   Copyright (C) 2002-2007,2009,2010 Free Software Foundation, Inc.
+   Copyright (C) 2002-2016 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Isamu Hasegawa <isamu@yamato.ibm.com>.
 
@@ -14,9 +21,20 @@
    Lesser General Public License for more details.
 
    You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
-   02110-1301 USA.  */
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef HAVE_STDINT_H
+#include <stdint.h>
+#endif
+
+#ifdef HAVE_STRINGS_H
+#include <strings.h>
+#endif
+
+#ifdef _LIBC
+# include <locale/weight.h>
+#endif
 
 static reg_errcode_t re_compile_internal (regex_t *preg, const char * pattern,
 					  size_t length, reg_syntax_t syntax);
@@ -126,7 +144,7 @@ static reg_errcode_t mark_opt_subexp (void *extra, bin_tree_t *node);
    POSIX doesn't require that we do anything for REG_NOERROR,
    but why not be nice?  */
 
-const char __re_error_msgid[] attribute_hidden =
+static const char __re_error_msgid[] =
   {
 #define REG_NOERROR_IDX	0
     gettext_noop ("Success")	/* REG_NOERROR */
@@ -150,9 +168,9 @@ const char __re_error_msgid[] attribute_hidden =
     gettext_noop ("Invalid back reference") /* REG_ESUBREG */
     "\0"
 #define REG_EBRACK_IDX	(REG_ESUBREG_IDX + sizeof "Invalid back reference")
-    gettext_noop ("Unmatched [ or [^")	/* REG_EBRACK */
+    gettext_noop ("Unmatched [, [^, [:, [., or [=")	/* REG_EBRACK */
     "\0"
-#define REG_EPAREN_IDX	(REG_EBRACK_IDX + sizeof "Unmatched [ or [^")
+#define REG_EPAREN_IDX	(REG_EBRACK_IDX + sizeof "Unmatched [, [^, [:, [., or [=")
     gettext_noop ("Unmatched ( or \\(") /* REG_EPAREN */
     "\0"
 #define REG_EBRACE_IDX	(REG_EPAREN_IDX + sizeof "Unmatched ( or \\(")
@@ -180,7 +198,7 @@ const char __re_error_msgid[] attribute_hidden =
     gettext_noop ("Unmatched ) or \\)") /* REG_ERPAREN */
   };
 
-const size_t __re_error_msgid_idx[] attribute_hidden =
+static const size_t __re_error_msgid_idx[] =
   {
     REG_NOERROR_IDX,
     REG_NOMATCH_IDX,
@@ -204,20 +222,19 @@ const size_t __re_error_msgid_idx[] attribute_hidden =
 /* Entry points for GNU code.  */
 
 
-#ifdef ZOS_USS
-
-/* For ZOS USS we must define btowc */
-
-wchar_t 
+#ifndef HAVE_BTOWC
+wchar_t
 btowc (int c)
 {
    wchar_t wtmp[2];
    char tmp[2];
+   mbstate_t mbs;
 
+   memset(& mbs, 0, sizeof(mbs));
    tmp[0] = c;
    tmp[1] = 0;
 
-   mbtowc (wtmp, tmp, 1);
+   mbrtowc (wtmp, tmp, 1, & mbs);
    return wtmp[0];
 }
 #endif
@@ -226,12 +243,11 @@ btowc (int c)
    compiles PATTERN (of length LENGTH) and puts the result in BUFP.
    Returns 0 if the pattern was valid, otherwise an error string.
 
-   Assumes the `allocated' (and perhaps `buffer') and `translate' fields
+   Assumes the 'allocated' (and perhaps 'buffer') and 'translate' fields
    are set in BUFP on entry.  */
 
 const char *
-re_compile_pattern (const char *pattern,
-		    size_t length,
+re_compile_pattern (const char *pattern, size_t length,
 		    struct re_pattern_buffer *bufp)
 {
   reg_errcode_t ret;
@@ -254,7 +270,7 @@ re_compile_pattern (const char *pattern,
 weak_alias (__re_compile_pattern, re_compile_pattern)
 #endif
 
-/* Set by `re_set_syntax' to the current regexp syntax to recognize.  Can
+/* Set by 're_set_syntax' to the current regexp syntax to recognize.  Can
    also be assigned to arbitrarily: each pattern buffer stores its own
    syntax, so it can be changed between regex compilations.  */
 /* This has no initializer because initialized variables in Emacs
@@ -303,8 +319,8 @@ weak_alias (__re_compile_fastmap, re_compile_fastmap)
 #endif
 
 static inline void
-__attribute ((always_inline))
-re_set_fastmap (char *fastmap, int icase, int ch)
+__attribute__ ((always_inline))
+re_set_fastmap (char *fastmap, bool icase, int ch)
 {
   fastmap[ch] = 1;
   if (icase)
@@ -318,7 +334,7 @@ static void
 re_compile_fastmap_iter (regex_t *bufp, const re_dfastate_t *init_state,
 			 char *fastmap)
 {
-  volatile re_dfa_t *dfa = (re_dfa_t *) bufp->buffer;
+  re_dfa_t *dfa = (re_dfa_t *) bufp->buffer;
   int node_cnt;
   int icase = (dfa->mb_cur_max == 1 && (bufp->syntax & RE_ICASE));
   for (node_cnt = 0; node_cnt < init_state->nodes.nelem; ++node_cnt)
@@ -332,14 +348,15 @@ re_compile_fastmap_iter (regex_t *bufp, const re_dfastate_t *init_state,
 #ifdef RE_ENABLE_I18N
 	  if ((bufp->syntax & RE_ICASE) && dfa->mb_cur_max > 1)
 	    {
-	      unsigned char *buf = re_malloc (unsigned char, dfa->mb_cur_max), *p;
+	      unsigned char buf[MB_LEN_MAX];
+	      unsigned char *p;
 	      wchar_t wc;
 	      mbstate_t state;
 
 	      p = buf;
 	      *p++ = dfa->nodes[node].opr.c;
 	      while (++node < dfa->nodes_len
-		     && dfa->nodes[node].type == CHARACTER
+		     &&	dfa->nodes[node].type == CHARACTER
 		     && dfa->nodes[node].mb_partial)
 		*p++ = dfa->nodes[node].opr.c;
 	      memset (&state, '\0', sizeof (state));
@@ -348,7 +365,6 @@ re_compile_fastmap_iter (regex_t *bufp, const re_dfastate_t *init_state,
 		  && (__wcrtomb ((char *) buf, towlower (wc), &state)
 		      != (size_t) -1))
 		re_set_fastmap (fastmap, 0, buf[0]);
-	      re_free (buf);
 	    }
 #endif
 	}
@@ -450,15 +466,15 @@ re_compile_fastmap_iter (regex_t *bufp, const re_dfastate_t *init_state,
    PREG is a regex_t *.  We do not expect any fields to be initialized,
    since POSIX says we shouldn't.  Thus, we set
 
-     `buffer' to the compiled pattern;
-     `used' to the length of the compiled pattern;
-     `syntax' to RE_SYNTAX_POSIX_EXTENDED if the
+     'buffer' to the compiled pattern;
+     'used' to the length of the compiled pattern;
+     'syntax' to RE_SYNTAX_POSIX_EXTENDED if the
        REG_EXTENDED bit in CFLAGS is set; otherwise, to
        RE_SYNTAX_POSIX_BASIC;
-     `newline_anchor' to REG_NEWLINE being set in CFLAGS;
-     `fastmap' to an allocated space for the fastmap;
-     `fastmap_accurate' to zero;
-     `re_nsub' to the number of subexpressions in PATTERN.
+     'newline_anchor' to REG_NEWLINE being set in CFLAGS;
+     'fastmap' to an allocated space for the fastmap;
+     'fastmap_accurate' to zero;
+     're_nsub' to the number of subexpressions in PATTERN.
 
    PATTERN is the address of the pattern string.
 
@@ -481,9 +497,7 @@ re_compile_fastmap_iter (regex_t *bufp, const re_dfastate_t *init_state,
    the return codes and their meanings.)  */
 
 int
-regcomp (regex_t *__restrict preg,
-	 const char *__restrict pattern,
-	 int cflags)
+regcomp (regex_t *__restrict preg, const char *__restrict pattern, int cflags)
 {
   reg_errcode_t ret;
   reg_syntax_t syntax = ((cflags & REG_EXTENDED) ? RE_SYNTAX_POSIX_EXTENDED
@@ -542,8 +556,8 @@ weak_alias (__regcomp, regcomp)
    from either regcomp or regexec.   We don't use PREG here.  */
 
 size_t
-regerror(int errcode, const regex_t *__restrict preg,
-	 char *__restrict errbuf, size_t errbuf_size)
+regerror (int errcode, const regex_t *__restrict preg, char *__restrict errbuf,
+	  size_t errbuf_size)
 {
   const char *msg;
   size_t msg_size;
@@ -565,8 +579,12 @@ regerror(int errcode, const regex_t *__restrict preg,
     {
       if (BE (msg_size > errbuf_size, 0))
 	{
+#if defined HAVE_MEMPCPY || defined _LIBC
+	  *((char *) __mempcpy (errbuf, msg, errbuf_size - 1)) = '\0';
+#else
 	  memcpy (errbuf, msg, errbuf_size - 1);
 	  errbuf[errbuf_size - 1] = 0;
+#endif
 	}
       else
 	memcpy (errbuf, msg, msg_size);
@@ -679,8 +697,7 @@ char *
    regcomp/regexec above without link errors.  */
 weak_function
 # endif
-re_comp (s)
-     const char *s;
+re_comp (const char *s)
 {
   reg_errcode_t ret;
   char *fastmap;
@@ -709,7 +726,7 @@ re_comp (s)
 				 + __re_error_msgid_idx[(int) REG_ESPACE]);
     }
 
-  /* Since `re_exec' always passes NULL for the `regs' argument, we
+  /* Since 're_exec' always passes NULL for the 'regs' argument, we
      don't need to initialize the pattern buffer fields which affect it.  */
 
   /* Match anchors at newlines.  */
@@ -787,7 +804,7 @@ re_compile_internal (regex_t *preg, const char * pattern, size_t length,
   __libc_lock_init (dfa->lock);
 
   err = re_string_construct (&regexp, pattern, length, preg->translate,
-			     syntax & RE_ICASE, dfa);
+			     (syntax & RE_ICASE) != 0, dfa);
   if (BE (err != REG_NOERROR, 0))
     {
     re_compile_internal_free_return:
@@ -886,20 +903,9 @@ init_dfa (re_dfa_t *dfa, size_t pat_len)
     codeset_name = strchr (codeset_name, '.') + 1;
 # endif
 
-  /* strcasecmp isn't a standard interface. brute force check */
-#if 0
   if (strcasecmp (codeset_name, "UTF-8") == 0
       || strcasecmp (codeset_name, "UTF8") == 0)
     dfa->is_utf8 = 1;
-#else
-  if (   (codeset_name[0] == 'U' || codeset_name[0] == 'u')
-      && (codeset_name[1] == 'T' || codeset_name[1] == 't')
-      && (codeset_name[2] == 'F' || codeset_name[2] == 'f')
-      && (codeset_name[3] == '-'
-          ? codeset_name[4] == '8' && codeset_name[5] == '\0'
-          : codeset_name[3] == '8' && codeset_name[4] == '\0'))
-    dfa->is_utf8 = 1;
-#endif
 
   /* We check exhaustively in the loop below if this charset is a
      superset of ASCII.  */
@@ -964,6 +970,39 @@ init_word_char (re_dfa_t *dfa)
 {
   int i, j, ch;
   dfa->word_ops_used = 1;
+#ifndef GAWK
+  if (BE (dfa->map_notascii == 0, 1))
+    {
+      if (sizeof (dfa->word_char[0]) == 8)
+	{
+          /* The extra temporaries here avoid "implicitly truncated"
+             warnings in the case when this is dead code, i.e. 32-bit.  */
+          const uint64_t wc0 = UINT64_C (0x03ff000000000000);
+          const uint64_t wc1 = UINT64_C (0x07fffffe87fffffe);
+	  dfa->word_char[0] = wc0;
+	  dfa->word_char[1] = wc1;
+	  i = 2;
+	}
+      else if (sizeof (dfa->word_char[0]) == 4)
+	{
+	  dfa->word_char[0] = UINT32_C (0x00000000);
+	  dfa->word_char[1] = UINT32_C (0x03ff0000);
+	  dfa->word_char[2] = UINT32_C (0x87fffffe);
+	  dfa->word_char[3] = UINT32_C (0x07fffffe);
+	  i = 4;
+	}
+      else
+	abort ();
+      ch = 128;
+
+      if (BE (dfa->is_utf8, 1))
+	{
+	  memset (&dfa->word_char[i], '\0', (SBC_MAX - ch) / 8);
+	  return;
+	}
+    }
+#endif
+
   for (i = 0, ch = 0; i < BITSET_WORDS; ++i)
     for (j = 0; j < BITSET_WORD_BITS; ++j, ++ch)
       if (isalnum (ch) || ch == '_')
@@ -1162,7 +1201,12 @@ analyze (regex_t *preg)
 	  || dfa->eclosures == NULL, 0))
     return REG_ESPACE;
 
-  dfa->subexp_map = re_malloc (int, preg->re_nsub);
+  /* some malloc()-checkers don't like zero allocations */
+  if (preg->re_nsub > 0)
+    dfa->subexp_map = re_malloc (int, preg->re_nsub);
+  else
+    dfa->subexp_map = NULL;
+
   if (dfa->subexp_map != NULL)
     {
       int i;
@@ -1510,7 +1554,7 @@ duplicate_node_closure (re_dfa_t *dfa, int top_org_node, int top_clone_node,
 	     destination.  */
 	  org_dest = dfa->edests[org_node].elems[0];
 	  re_node_set_empty (dfa->edests + clone_node);
-	  /* If the node is root_node itself, it means the epsilon clsoure
+	  /* If the node is root_node itself, it means the epsilon closure
 	     has a loop.   Then tie it to the destination of the root_node.  */
 	  if (org_node == root_node && clone_node != org_node)
 	    {
@@ -1519,7 +1563,7 @@ duplicate_node_closure (re_dfa_t *dfa, int top_org_node, int top_clone_node,
 		return REG_ESPACE;
 	      break;
 	    }
-	  /* In case of the node has another constraint, add it.  */
+	  /* In case the node has another constraint, append it.  */
 	  constraint |= dfa->nodes[org_node].constraint;
 	  clone_dest = duplicate_node (dfa, org_dest, constraint);
 	  if (BE (clone_dest == -1, 0))
@@ -1662,7 +1706,7 @@ calc_eclosure (re_dfa_t *dfa)
       /* If we have already calculated, skip it.  */
       if (dfa->eclosures[node_idx].nelem != 0)
 	continue;
-      /* Calculate epsilon closure of `node_idx'.  */
+      /* Calculate epsilon closure of 'node_idx'.  */
       err = calc_eclosure_iter (&eclosure_elem, dfa, node_idx, 1);
       if (BE (err != REG_NOERROR, 0))
 	return err;
@@ -1729,11 +1773,11 @@ calc_eclosure_iter (re_node_set *new_set, re_dfa_t *dfa, int node, int root)
 	  }
 	else
 	  eclosure_elem = dfa->eclosures[edest];
-	/* Merge the epsilon closure of `edest'.  */
+	/* Merge the epsilon closure of 'edest'.  */
 	err = re_node_set_merge (&eclosure, &eclosure_elem);
 	if (BE (err != REG_NOERROR, 0))
 	  return err;
-	/* If the epsilon closure of `edest' is incomplete,
+	/* If the epsilon closure of 'edest' is incomplete,
 	   the epsilon closure of this node is also incomplete.  */
 	if (dfa->eclosures[edest].nelem == 0)
 	  {
@@ -2095,7 +2139,7 @@ peek_token_bracket (re_token_t *token, re_string_t *input, reg_syntax_t syntax)
 
 /* Entry point of the parser.
    Parse the regular expression REGEXP and return the structure tree.
-   If an error has occurred, ERR is set by error code, and return NULL.
+   If an error occurs, ERR is set by error code, and return NULL.
    This function build the following tree, from regular expression <reg_exp>:
 	   CAT
 	   / \
@@ -2137,7 +2181,7 @@ parse (re_string_t *regexp, regex_t *preg, reg_syntax_t syntax,
 	  /   \
    <branch1> <branch2>
 
-   ALT means alternative, which represents the operator `|'.  */
+   ALT means alternative, which represents the operator '|'.  */
 
 static bin_tree_t *
 parse_reg_exp (re_string_t *regexp, regex_t *preg, re_token_t *token,
@@ -2145,6 +2189,7 @@ parse_reg_exp (re_string_t *regexp, regex_t *preg, re_token_t *token,
 {
   re_dfa_t *dfa = (re_dfa_t *) preg->buffer;
   bin_tree_t *tree, *branch = NULL;
+  bitset_word_t initial_bkref_map = dfa->completed_bkref_map;
   tree = parse_branch (regexp, preg, token, syntax, nest, err);
   if (BE (*err != REG_NOERROR && tree == NULL, 0))
     return NULL;
@@ -2155,9 +2200,16 @@ parse_reg_exp (re_string_t *regexp, regex_t *preg, re_token_t *token,
       if (token->type != OP_ALT && token->type != END_OF_RE
 	  && (nest == 0 || token->type != OP_CLOSE_SUBEXP))
 	{
+	  bitset_word_t accumulated_bkref_map = dfa->completed_bkref_map;
+	  dfa->completed_bkref_map = initial_bkref_map;
 	  branch = parse_branch (regexp, preg, token, syntax, nest, err);
 	  if (BE (*err != REG_NOERROR && branch == NULL, 0))
-	    return NULL;
+	    {
+	      if (tree != NULL)
+		postorder (tree, free_tree, NULL);
+	      return NULL;
+	    }
+	  dfa->completed_bkref_map |= accumulated_bkref_map;
 	}
       else
 	branch = NULL;
@@ -2196,16 +2248,21 @@ parse_branch (re_string_t *regexp, regex_t *preg, re_token_t *token,
       exp = parse_expression (regexp, preg, token, syntax, nest, err);
       if (BE (*err != REG_NOERROR && exp == NULL, 0))
 	{
+	  if (tree != NULL)
+	    postorder (tree, free_tree, NULL);
 	  return NULL;
 	}
       if (tree != NULL && exp != NULL)
 	{
-	  tree = create_tree (dfa, tree, exp, CONCAT);
-	  if (tree == NULL)
+	  bin_tree_t *newtree = create_tree (dfa, tree, exp, CONCAT);
+	  if (newtree == NULL)
 	    {
+	      postorder (exp, free_tree, NULL);
+	      postorder (tree, free_tree, NULL);
 	      *err = REG_ESPACE;
 	      return NULL;
 	    }
+	  tree = newtree;
 	}
       else if (tree == NULL)
 	tree = exp;
@@ -2413,14 +2470,21 @@ parse_expression (re_string_t *regexp, regex_t *preg, re_token_t *token,
   while (token->type == OP_DUP_ASTERISK || token->type == OP_DUP_PLUS
 	 || token->type == OP_DUP_QUESTION || token->type == OP_OPEN_DUP_NUM)
     {
-      tree = parse_dup_op (tree, regexp, dfa, token, syntax, err);
-      if (BE (*err != REG_NOERROR && tree == NULL, 0))
-	return NULL;
+      bin_tree_t *dup_tree = parse_dup_op (tree, regexp, dfa, token, syntax, err);
+      if (BE (*err != REG_NOERROR && dup_tree == NULL, 0))
+	{
+	  if (tree != NULL)
+	    postorder (tree, free_tree, NULL);
+	  return NULL;
+	}
+      tree = dup_tree;
       /* In BRE consecutive duplications are not allowed.  */
       if ((syntax & RE_CONTEXT_INVALID_DUP)
 	  && (token->type == OP_DUP_ASTERISK
 	      || token->type == OP_OPEN_DUP_NUM))
 	{
+	  if (tree != NULL)
+	    postorder (tree, free_tree, NULL);
 	  *err = REG_BADRPT;
 	  return NULL;
 	}
@@ -2454,7 +2518,11 @@ parse_sub_exp (re_string_t *regexp, regex_t *preg, re_token_t *token,
     {
       tree = parse_reg_exp (regexp, preg, token, syntax, nest, err);
       if (BE (*err == REG_NOERROR && token->type != OP_CLOSE_SUBEXP, 0))
-	*err = REG_EPAREN;
+	{
+	  if (tree != NULL)
+	    postorder (tree, free_tree, NULL);
+	  *err = REG_EPAREN;
+	}
       if (BE (*err != REG_NOERROR, 0))
 	return NULL;
     }
@@ -2480,13 +2548,7 @@ parse_dup_op (bin_tree_t *elem, re_string_t *regexp, re_dfa_t *dfa,
 {
   bin_tree_t *tree = NULL, *old_tree = NULL;
   int i, start, end, start_idx = re_string_cur_idx (regexp);
-#ifndef RE_TOKEN_INIT_BUG
   re_token_t start_token = *token;
-#else
-  re_token_t start_token;
-
-  memcpy ((void *) &start_token, (void *) token, sizeof start_token);
-#endif
 
   if (token->type == OP_OPEN_DUP_NUM)
     {
@@ -2571,13 +2633,15 @@ parse_dup_op (bin_tree_t *elem, re_string_t *regexp, re_dfa_t *dfa,
 
       /* Duplicate ELEM before it is marked optional.  */
       elem = duplicate_tree (elem, dfa);
+      if (BE (elem == NULL, 0))
+        goto parse_dup_op_espace;
       old_tree = tree;
     }
   else
     old_tree = NULL;
 
   if (elem->token.type == SUBEXP)
-    postorder (elem, mark_opt_subexp, (void *) (intptr_t) elem->token.opr.idx);
+    postorder (elem, mark_opt_subexp, (void *) (long) elem->token.opr.idx);
 
   tree = create_tree (dfa, elem, NULL, (end == -1 ? OP_DUP_ASTERISK : OP_ALT));
   if (BE (tree == NULL, 0))
@@ -2623,11 +2687,12 @@ parse_dup_op (bin_tree_t *elem, re_string_t *regexp, re_dfa_t *dfa,
 static reg_errcode_t
 internal_function
 # ifdef RE_ENABLE_I18N
-build_range_exp (bitset_t sbcset, re_charset_t *mbcset, int *range_alloc,
-		 bracket_elem_t *start_elem, bracket_elem_t *end_elem)
+build_range_exp (reg_syntax_t syntax, bitset_t sbcset, re_charset_t *mbcset,
+		int *range_alloc, bracket_elem_t *start_elem,
+		bracket_elem_t *end_elem)
 # else /* not RE_ENABLE_I18N */
-build_range_exp (bitset_t sbcset, bracket_elem_t *start_elem,
-		 bracket_elem_t *end_elem)
+build_range_exp (reg_syntax_t syntax, bitset_t sbcset,
+		bracket_elem_t *start_elem, bracket_elem_t *end_elem)
 # endif /* not RE_ENABLE_I18N */
 {
   unsigned int start_ch, end_ch;
@@ -2650,7 +2715,6 @@ build_range_exp (bitset_t sbcset, bracket_elem_t *start_elem,
     wchar_t wc;
     wint_t start_wc;
     wint_t end_wc;
-    wchar_t cmp_buf[6] = {L'\0', L'\0', L'\0', L'\0', L'\0', L'\0'};
 
     start_ch = ((start_elem->type == SB_CHAR) ? start_elem->opr.ch
 		: ((start_elem->type == COLL_SYM) ? start_elem->opr.name[0]
@@ -2676,9 +2740,7 @@ build_range_exp (bitset_t sbcset, bracket_elem_t *start_elem,
 #endif
     if (start_wc == WEOF || end_wc == WEOF)
       return REG_ECOLLATE;
-    cmp_buf[0] = start_wc;
-    cmp_buf[4] = end_wc;
-    if (wcscoll (cmp_buf, cmp_buf + 4) > 0)
+    else if (BE ((syntax & RE_NO_EMPTY_RANGES) && start_wc > end_wc, 0))
       return REG_ERANGE;
 
     /* Got valid collation sequence values, add them as a new entry.
@@ -2705,7 +2767,14 @@ build_range_exp (bitset_t sbcset, bracket_elem_t *start_elem,
 					new_nranges);
 
 	    if (BE (new_array_start == NULL || new_array_end == NULL, 0))
-	      return REG_ESPACE;
+              {
+                 /* if one is not NULL, free it to avoid leaks */
+                 if (new_array_start != NULL)
+                     re_free(new_array_start);
+                 if (new_array_end != NULL)
+                     re_free(new_array_end);
+	         return REG_ESPACE;
+	      }
 
 	    mbcset->range_starts = new_array_start;
 	    mbcset->range_ends = new_array_end;
@@ -2719,10 +2788,8 @@ build_range_exp (bitset_t sbcset, bracket_elem_t *start_elem,
     /* Build the table for single byte characters.  */
     for (wc = 0; wc < SBC_MAX; ++wc)
       {
-	cmp_buf[2] = wc;
-	if (wcscoll (cmp_buf, cmp_buf + 2) <= 0
-	    && wcscoll (cmp_buf + 2, cmp_buf + 4) <= 0)
-	  bitset_set (sbcset, wc);
+         if (start_wc <= wc && wc <= end_wc)
+           bitset_set (sbcset, wc);
       }
   }
 # else /* not RE_ENABLE_I18N */
@@ -2789,41 +2856,30 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
   const unsigned char *extra;
 
   /* Local function for parse_bracket_exp used in _LIBC environment.
-     Seek the collating symbol entry correspondings to NAME.
-     Return the index of the symbol in the SYMB_TABLE.  */
+     Seek the collating symbol entry corresponding to NAME.
+     Return the index of the symbol in the SYMB_TABLE,
+     or -1 if not found.  */
 
   auto inline int32_t
-  __attribute ((always_inline))
-  seek_collating_symbol_entry (name, name_len)
-	 const unsigned char *name;
-	 size_t name_len;
+  __attribute__ ((always_inline))
+  seek_collating_symbol_entry (const unsigned char *name, size_t name_len)
     {
-      int32_t hash = elem_hash ((const char *) name, name_len);
-      int32_t elem = hash % table_size;
-      if (symb_table[2 * elem] != 0)
-	{
-	  int32_t second = hash % (table_size - 2) + 1;
-
-	  do
-	    {
-	      /* First compare the hashing value.  */
-	      if (symb_table[2 * elem] == hash
-		  /* Compare the length of the name.  */
-		  && name_len == extra[symb_table[2 * elem + 1]]
-		  /* Compare the name.  */
-		  && memcmp (name, &extra[symb_table[2 * elem + 1] + 1],
-			     name_len) == 0)
-		{
-		  /* Yep, this is the entry.  */
-		  break;
-		}
+      int32_t elem;
 
-	      /* Next entry.  */
-	      elem += second;
-	    }
-	  while (symb_table[2 * elem] != 0);
-	}
-      return elem;
+      for (elem = 0; elem < table_size; elem++)
+	if (symb_table[2 * elem] != 0)
+	  {
+	    int32_t idx = symb_table[2 * elem + 1];
+	    /* Skip the name of collating element name.  */
+	    idx += 1 + extra[idx];
+	    if (/* Compare the length of the name.  */
+		name_len == extra[idx]
+		/* Compare the name.  */
+		&& memcmp (name, &extra[idx + 1], name_len) == 0)
+	      /* Yep, this is the entry.  */
+	      return elem;
+	  }
+      return -1;
     }
 
   /* Local function for parse_bracket_exp used in _LIBC environment.
@@ -2831,9 +2887,8 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
      Return the value if succeeded, UINT_MAX otherwise.  */
 
   auto inline unsigned int
-  __attribute ((always_inline))
-  lookup_collation_sequence_value (br_elem)
-	 bracket_elem_t *br_elem;
+  __attribute__ ((always_inline))
+  lookup_collation_sequence_value (bracket_elem_t *br_elem)
     {
       if (br_elem->type == SB_CHAR)
 	{
@@ -2861,7 +2916,7 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
 	      int32_t elem, idx;
 	      elem = seek_collating_symbol_entry (br_elem->opr.name,
 						  sym_name_len);
-	      if (symb_table[2 * elem] != 0)
+	      if (elem != -1)
 		{
 		  /* We found the entry.  */
 		  idx = symb_table[2 * elem + 1];
@@ -2879,7 +2934,7 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
 		  /* Return the collation sequence value.  */
 		  return *(unsigned int *) (extra + idx);
 		}
-	      else if (symb_table[2 * elem] == 0 && sym_name_len == 1)
+	      else if (sym_name_len == 1)
 		{
 		  /* No valid character.  Match it as a single byte
 		     character.  */
@@ -2900,12 +2955,9 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
      update it.  */
 
   auto inline reg_errcode_t
-  __attribute ((always_inline))
-  build_range_exp (sbcset, mbcset, range_alloc, start_elem, end_elem)
-	 re_charset_t *mbcset;
-	 int *range_alloc;
-	 bitset_t sbcset;
-	 bracket_elem_t *start_elem, *end_elem;
+  __attribute__ ((always_inline))
+  build_range_exp (bitset_t sbcset, re_charset_t *mbcset, int *range_alloc,
+		   bracket_elem_t *start_elem, bracket_elem_t *end_elem)
     {
       unsigned int ch;
       uint32_t start_collseq;
@@ -2983,26 +3035,23 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
      pointer argument since we may update it.  */
 
   auto inline reg_errcode_t
-  __attribute ((always_inline))
-  build_collating_symbol (sbcset, mbcset, coll_sym_alloc, name)
-	 re_charset_t *mbcset;
-	 int *coll_sym_alloc;
-	 bitset_t sbcset;
-	 const unsigned char *name;
+  __attribute__ ((always_inline))
+  build_collating_symbol (bitset_t sbcset, re_charset_t *mbcset,
+			  int *coll_sym_alloc, const unsigned char *name)
     {
       int32_t elem, idx;
       size_t name_len = strlen ((const char *) name);
       if (nrules != 0)
 	{
 	  elem = seek_collating_symbol_entry (name, name_len);
-	  if (symb_table[2 * elem] != 0)
+	  if (elem != -1)
 	    {
 	      /* We found the entry.  */
 	      idx = symb_table[2 * elem + 1];
 	      /* Skip the name of collating element name.  */
 	      idx += 1 + extra[idx];
 	    }
-	  else if (symb_table[2 * elem] == 0 && name_len == 1)
+	  else if (name_len == 1)
 	    {
 	      /* No valid character, treat it as a normal
 		 character.  */
@@ -3082,6 +3131,10 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
   if (BE (sbcset == NULL, 0))
 #endif /* RE_ENABLE_I18N */
     {
+      re_free (sbcset);
+#ifdef RE_ENABLE_I18N
+      re_free (mbcset);
+#endif
       *err = REG_ESPACE;
       return NULL;
     }
@@ -3123,6 +3176,7 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
       re_token_t token2;
 
       start_elem.opr.name = start_name_buf;
+      start_elem.type = COLL_SYM;
       ret = parse_bracket_element (&start_elem, regexp, token, token_len, dfa,
 				   syntax, first_round);
       if (BE (ret != REG_NOERROR, 0))
@@ -3166,6 +3220,7 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
       if (is_range_exp == 1)
 	{
 	  end_elem.opr.name = end_name_buf;
+	  end_elem.type = COLL_SYM;
 	  ret = parse_bracket_element (&end_elem, regexp, &token2, token_len2,
 				       dfa, syntax, 1);
 	  if (BE (ret != REG_NOERROR, 0))
@@ -3177,15 +3232,15 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token,
 	  token_len = peek_token_bracket (token, regexp, syntax);
 
 #ifdef _LIBC
-	  *err = build_range_exp (sbcset, mbcset, &range_alloc,
+	  *err = build_range_exp (syntax, sbcset, mbcset, &range_alloc,
 				  &start_elem, &end_elem);
 #else
 # ifdef RE_ENABLE_I18N
-	  *err = build_range_exp (sbcset,
+	  *err = build_range_exp (syntax, sbcset,
 				  dfa->mb_cur_max > 1 ? mbcset : NULL,
 				  &range_alloc, &start_elem, &end_elem);
 # else
-	  *err = build_range_exp (sbcset, &start_elem, &end_elem);
+	  *err = build_range_exp (syntax, sbcset, &start_elem, &end_elem);
 # endif
 #endif /* RE_ENABLE_I18N */
 	  if (BE (*err != REG_NOERROR, 0))
@@ -3439,8 +3494,6 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
       int32_t idx1, idx2;
       unsigned int ch;
       size_t len;
-      /* This #include defines a local function!  */
-# include <locale/weight.h>
       /* Calculate the index for equivalence class.  */
       cp = name;
       table = (const int32_t *) _NL_CURRENT (LC_COLLATE, _NL_COLLATE_TABLEMB);
@@ -3450,19 +3503,19 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
 						   _NL_COLLATE_EXTRAMB);
       indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE,
 						_NL_COLLATE_INDIRECTMB);
-      idx1 = findidx (&cp);
-      if (BE (idx1 == 0 || cp < name + strlen ((const char *) name), 0))
+      idx1 = findidx (table, indirect, extra, &cp, -1);
+      if (BE (idx1 == 0 || *cp != '\0', 0))
 	/* This isn't a valid character.  */
 	return REG_ECOLLATE;
 
-      /* Build single byte matcing table for this equivalence class.  */
+      /* Build single byte matching table for this equivalence class.  */
       char_buf[1] = (unsigned char) '\0';
       len = weights[idx1 & 0xffffff];
       for (ch = 0; ch < SBC_MAX; ++ch)
 	{
 	  char_buf[0] = ch;
 	  cp = char_buf;
-	  idx2 = findidx (&cp);
+	  idx2 = findidx (table, indirect, extra, &cp, 1);
 /*
 	  idx2 = table[ch];
 */
@@ -3630,6 +3683,13 @@ build_charclass_op (re_dfa_t *dfa, RE_TRANSLATE_TYPE trans,
   if (BE (sbcset == NULL, 0))
 #endif /* not RE_ENABLE_I18N */
     {
+      /* if one is not NULL, free it to avoid leaks */
+      if (sbcset != NULL)
+         free(sbcset);
+#ifdef RE_ENABLE_I18N
+      if (mbcset != NULL)
+         free(mbcset);
+#endif
       *err = REG_ESPACE;
       return NULL;
     }
@@ -3672,6 +3732,7 @@ build_charclass_op (re_dfa_t *dfa, RE_TRANSLATE_TYPE trans,
 #endif
 
   /* Build a tree for simple bracket.  */
+  memset(& br_token, 0, sizeof(br_token));	/* silence "not initialized" errors froms static checkers */
   br_token.type = SIMPLE_BRACKET;
   br_token.opr.sbcset = sbcset;
   tree = create_token_tree (dfa, NULL, NULL, &br_token);
@@ -3715,7 +3776,7 @@ build_charclass_op (re_dfa_t *dfa, RE_TRANSLATE_TYPE trans,
 /* This is intended for the expressions like "a{1,3}".
    Fetch a number from `input', and return the number.
    Return -1, if the number field is empty like "{,1}".
-   Return -2, if an error has occurred.  */
+   Return -2, If an error is occured.  */
 
 static int
 fetch_number (re_string_t *input, re_token_t *token, reg_syntax_t syntax)
@@ -3762,6 +3823,7 @@ create_tree (re_dfa_t *dfa, bin_tree_t *left, bin_tree_t *right,
 	     re_token_type_t type)
 {
   re_token_t t;
+  memset(& t, 0, sizeof(t));	/* silence "not initialized" errors froms static checkers */
   t.type = type;
   return create_token_tree (dfa, left, right, &t);
 }
@@ -3806,7 +3868,7 @@ create_token_tree (re_dfa_t *dfa, bin_tree_t *left, bin_tree_t *right,
 static reg_errcode_t
 mark_opt_subexp (void *extra, bin_tree_t *node)
 {
-  int idx = (int) (intptr_t) extra;
+  int idx = (int) (long) extra;
   if (node->token.type == SUBEXP && node->token.opr.idx == idx)
     node->token.opt_subexp = 1;
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/7] fixup! compat/regex: update the gawk regex engine from upstream
  2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2017-05-04 22:00 ` [PATCH 3/7] fixup! " Ævar Arnfjörð Bjarmason
@ 2017-05-04 22:00 ` Ævar Arnfjörð Bjarmason
  2017-05-04 22:00 ` [PATCH 5/7] " Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-04 22:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins,
	Ævar Arnfjörð Bjarmason

---
 compat/regex/regex.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/compat/regex/regex.c b/compat/regex/regex.c
index 5cb23e5d59..d6e525e567 100644
--- a/compat/regex/regex.c
+++ b/compat/regex/regex.c
@@ -1,5 +1,12 @@
+/*
+ * This is git.git's copy of gawk.git's regex engine. Please see that
+ * project for the latest version & to submit patches to this code,
+ * and git.git's compat/regex/README for information on how git's copy
+ * of this code is maintained.
+ */
+
 /* Extended regular expression matching and search library.
-   Copyright (C) 2002, 2003, 2005 Free Software Foundation, Inc.
+   Copyright (C) 2002-2016 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Isamu Hasegawa <isamu@yamato.ibm.com>.
 
@@ -14,15 +21,14 @@
    Lesser General Public License for more details.
 
    You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
-   02110-1301 USA.  */
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
 
 #ifdef HAVE_CONFIG_H
 #include "config.h"
 #endif
 
-/* Make sure no one compiles this code with a C++ compiler.  */
+/* Make sure noone compiles this code with a C++ compiler.  */
 #ifdef __cplusplus
 # error "This is C code, use a C compiler"
 #endif
@@ -52,15 +58,15 @@
 # include "../locale/localeinfo.h"
 #endif
 
-#if defined (_MSC_VER)
-#include <stdio.h> /* for size_t */
-#endif
-
 /* On some systems, limits.h sets RE_DUP_MAX to a lower value than
    GNU regex allows.  Include it before <regex.h>, which correctly
    #undefs RE_DUP_MAX and sets it to the right value.  */
 #include <limits.h>
-#include <stdint.h>
+
+/* This header defines the MIN and MAX macros.  */
+#ifdef HAVE_SYS_PARAM_H
+#include <sys/param.h>
+#endif /* HAVE_SYS_PARAM_H */
 
 #ifdef GAWK
 #undef alloca
@@ -70,10 +76,8 @@
 #include "regex_internal.h"
 
 #include "regex_internal.c"
-#ifdef GAWK
-#define bool int
-#define true (1)
-#define false (0)
+#ifndef HAVE_STDBOOL_H
+#include "missing_d/gawkbool.h"
 #endif
 #include "regcomp.c"
 #include "regexec.c"
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/7] fixup! compat/regex: update the gawk regex engine from upstream
  2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2017-05-04 22:00 ` [PATCH 4/7] " Ævar Arnfjörð Bjarmason
@ 2017-05-04 22:00 ` Ævar Arnfjörð Bjarmason
  2017-05-04 22:00 ` [PATCH 6/7] " Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-04 22:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins,
	Ævar Arnfjörð Bjarmason

---
 compat/regex/regex_internal.c | 118 +++++++++++++++++++++++++-----------------
 compat/regex/regex_internal.h | 118 ++++++++++++++++++++++++++----------------
 2 files changed, 144 insertions(+), 92 deletions(-)

diff --git a/compat/regex/regex_internal.c b/compat/regex/regex_internal.c
index d4121f2f4f..6d766114a1 100644
--- a/compat/regex/regex_internal.c
+++ b/compat/regex/regex_internal.c
@@ -1,5 +1,12 @@
+/*
+ * This is git.git's copy of gawk.git's regex engine. Please see that
+ * project for the latest version & to submit patches to this code,
+ * and git.git's compat/regex/README for information on how git's copy
+ * of this code is maintained.
+ */
+
 /* Extended regular expression matching and search library.
-   Copyright (C) 2002-2006, 2010 Free Software Foundation, Inc.
+   Copyright (C) 2002-2016 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Isamu Hasegawa <isamu@yamato.ibm.com>.
 
@@ -14,9 +21,8 @@
    Lesser General Public License for more details.
 
    You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
-   02110-1301 USA.  */
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
 
 static void re_string_construct_common (const char *str, int len,
 					re_string_t *pstr,
@@ -45,7 +51,7 @@ MAX(size_t a, size_t b)
    re_string_reconstruct before using the object.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_string_allocate (re_string_t *pstr, const char *str, int len, int init_len,
 		    RE_TRANSLATE_TYPE trans, int icase, const re_dfa_t *dfa)
 {
@@ -73,7 +79,7 @@ re_string_allocate (re_string_t *pstr, const char *str, int len, int init_len,
 /* This function allocate the buffers, and initialize them.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_string_construct (re_string_t *pstr, const char *str, int len,
 		     RE_TRANSLATE_TYPE trans, int icase, const re_dfa_t *dfa)
 {
@@ -136,7 +142,7 @@ re_string_construct (re_string_t *pstr, const char *str, int len,
 /* Helper functions for re_string_allocate, and re_string_construct.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_string_realloc_buffers (re_string_t *pstr, int new_buf_len)
 {
 #ifdef RE_ENABLE_I18N
@@ -246,13 +252,8 @@ build_wcs_buffer (re_string_t *pstr)
       else
 	p = (const char *) pstr->raw_mbs + pstr->raw_mbs_idx + byte_idx;
       mbclen = __mbrtowc (&wc, p, remain_len, &pstr->cur_state);
-      if (BE (mbclen == (size_t) -2, 0))
-	{
-	  /* The buffer doesn't have enough space, finish to build.  */
-	  pstr->cur_state = prev_st;
-	  break;
-	}
-      else if (BE (mbclen == (size_t) -1 || mbclen == 0, 0))
+      if (BE (mbclen == (size_t) -1 || mbclen == 0
+	      || (mbclen == (size_t) -2 && pstr->bufs_len >= pstr->len), 0))
 	{
 	  /* We treat these cases as a singlebyte character.  */
 	  mbclen = 1;
@@ -261,6 +262,12 @@ build_wcs_buffer (re_string_t *pstr)
 	    wc = pstr->trans[wc];
 	  pstr->cur_state = prev_st;
 	}
+      else if (BE (mbclen == (size_t) -2, 0))
+	{
+	  /* The buffer doesn't have enough space, finish to build.  */
+	  pstr->cur_state = prev_st;
+	  break;
+	}
 
       /* Write wide character and padding.  */
       pstr->wcs[byte_idx++] = wc;
@@ -276,7 +283,7 @@ build_wcs_buffer (re_string_t *pstr)
    but for REG_ICASE.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 build_wcs_upper_buffer (re_string_t *pstr)
 {
   mbstate_t prev_st;
@@ -326,7 +333,7 @@ build_wcs_upper_buffer (re_string_t *pstr)
 		  size_t mbcdlen;
 
 		  wcu = towupper (wc);
-		  mbcdlen = wcrtomb (buf, wcu, &prev_st);
+		  mbcdlen = __wcrtomb (buf, wcu, &prev_st);
 		  if (BE (mbclen == mbcdlen, 1))
 		    memcpy (pstr->mbs + byte_idx, buf, mbclen);
 		  else
@@ -343,9 +350,11 @@ build_wcs_upper_buffer (re_string_t *pstr)
 	      for (remain_len = byte_idx + mbclen - 1; byte_idx < remain_len ;)
 		pstr->wcs[byte_idx++] = WEOF;
 	    }
-	  else if (mbclen == (size_t) -1 || mbclen == 0)
+	  else if (mbclen == (size_t) -1 || mbclen == 0
+		   || (mbclen == (size_t) -2 && pstr->bufs_len >= pstr->len))
 	    {
-	      /* It is an invalid character or '\0'.  Just use the byte.  */
+	      /* It is an invalid character, an incomplete character
+		 at the end of the string, or '\0'.  Just use the byte.  */
 	      int ch = pstr->raw_mbs[pstr->raw_mbs_idx + byte_idx];
 	      pstr->mbs[byte_idx] = ch;
 	      /* And also cast it to wide char.  */
@@ -458,7 +467,8 @@ build_wcs_upper_buffer (re_string_t *pstr)
 	    for (remain_len = byte_idx + mbclen - 1; byte_idx < remain_len ;)
 	      pstr->wcs[byte_idx++] = WEOF;
 	  }
-	else if (mbclen == (size_t) -1 || mbclen == 0)
+	else if (mbclen == (size_t) -1 || mbclen == 0
+		 || (mbclen == (size_t) -2 && pstr->bufs_len >= pstr->len))
 	  {
 	    /* It is an invalid character or '\0'.  Just use the byte.  */
 	    int ch = pstr->raw_mbs[pstr->raw_mbs_idx + src_idx];
@@ -505,11 +515,11 @@ re_string_skip_chars (re_string_t *pstr, int new_raw_idx, wint_t *last_wc)
        rawbuf_idx < new_raw_idx;)
     {
       wchar_t wc2;
-      int remain_len = pstr->len - rawbuf_idx;
+      int remain_len = pstr->raw_len - rawbuf_idx;
       prev_st = pstr->cur_state;
       mbclen = __mbrtowc (&wc2, (const char *) pstr->raw_mbs + rawbuf_idx,
 			  remain_len, &pstr->cur_state);
-      if (BE (mbclen == (size_t) -2 || mbclen == (size_t) -1 || mbclen == 0, 0))
+      if (BE ((ssize_t) mbclen <= 0, 0))
 	{
 	  /* We treat these cases as a single byte character.  */
 	  if (mbclen == 0 || remain_len == 0)
@@ -577,7 +587,7 @@ re_string_translate_buffer (re_string_t *pstr)
    convert to upper case in case of REG_ICASE, apply translation.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_string_reconstruct (re_string_t *pstr, int idx, int eflags)
 {
   int offset = idx - pstr->raw_mbs_idx;
@@ -685,7 +695,7 @@ re_string_reconstruct (re_string_t *pstr, int idx, int eflags)
 			 pstr->valid_len - offset);
 	      pstr->valid_len -= offset;
 	      pstr->valid_raw_len -= offset;
-#if DEBUG
+#if defined DEBUG && DEBUG
 	      assert (pstr->valid_len > 0);
 #endif
 	    }
@@ -741,16 +751,18 @@ re_string_reconstruct (re_string_t *pstr, int idx, int eflags)
 			  unsigned char buf[6];
 			  size_t mbclen;
 
+			  const unsigned char *pp = p;
 			  if (BE (pstr->trans != NULL, 0))
 			    {
 			      int i = mlen < 6 ? mlen : 6;
 			      while (--i >= 0)
 				buf[i] = pstr->trans[p[i]];
+			      pp = buf;
 			    }
 			  /* XXX Don't use mbrtowc, we know which conversion
 			     to use (UTF-8 -> UCS4).  */
 			  memset (&cur_state, 0, sizeof (cur_state));
-			  mbclen = __mbrtowc (&wc2, (const char *) p, mlen,
+			  mbclen = __mbrtowc (&wc2, (const char *) pp, mlen,
 					      &cur_state);
 			  if (raw + offset - p <= mbclen
 			      && mbclen < (size_t) -2)
@@ -835,7 +847,7 @@ re_string_reconstruct (re_string_t *pstr, int idx, int eflags)
 }
 
 static unsigned char
-internal_function __attribute ((pure))
+internal_function __attribute__ ((__pure__))
 re_string_peek_byte_case (const re_string_t *pstr, int idx)
 {
   int ch, off;
@@ -871,7 +883,7 @@ re_string_peek_byte_case (const re_string_t *pstr, int idx)
 }
 
 static unsigned char
-internal_function __attribute ((pure))
+internal_function
 re_string_fetch_byte_case (re_string_t *pstr)
 {
   if (BE (!pstr->mbs_allocated, 1))
@@ -940,7 +952,7 @@ re_string_context_at (const re_string_t *input, int idx, int eflags)
       int wc_idx = idx;
       while(input->wcs[wc_idx] == WEOF)
 	{
-#ifdef DEBUG
+#if defined DEBUG && DEBUG
 	  /* It must not happen.  */
 	  assert (wc_idx >= 0);
 #endif
@@ -967,7 +979,7 @@ re_string_context_at (const re_string_t *input, int idx, int eflags)
 /* Functions for set operation.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_node_set_alloc (re_node_set *set, int size)
 {
   /*
@@ -989,7 +1001,7 @@ re_node_set_alloc (re_node_set *set, int size)
 }
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_node_set_init_1 (re_node_set *set, int elem)
 {
   set->alloc = 1;
@@ -1005,7 +1017,7 @@ re_node_set_init_1 (re_node_set *set, int elem)
 }
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_node_set_init_2 (re_node_set *set, int elem1, int elem2)
 {
   set->alloc = 2;
@@ -1035,7 +1047,7 @@ re_node_set_init_2 (re_node_set *set, int elem1, int elem2)
 }
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_node_set_init_copy (re_node_set *dest, const re_node_set *src)
 {
   dest->nelem = src->nelem;
@@ -1060,7 +1072,7 @@ re_node_set_init_copy (re_node_set *dest, const re_node_set *src)
    Note: We assume dest->elems is NULL, when dest->alloc is 0.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_node_set_add_intersect (re_node_set *dest, const re_node_set *src1,
 			   const re_node_set *src2)
 {
@@ -1151,7 +1163,7 @@ re_node_set_add_intersect (re_node_set *dest, const re_node_set *src1,
    DEST. Return value indicate the error code or REG_NOERROR if succeeded.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_node_set_init_union (re_node_set *dest, const re_node_set *src1,
 			const re_node_set *src2)
 {
@@ -1204,7 +1216,7 @@ re_node_set_init_union (re_node_set *dest, const re_node_set *src1,
    DEST. Return value indicate the error code or REG_NOERROR if succeeded.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 re_node_set_merge (re_node_set *dest, const re_node_set *src)
 {
   int is, id, sbase, delta;
@@ -1284,10 +1296,10 @@ re_node_set_merge (re_node_set *dest, const re_node_set *src)
 
 /* Insert the new element ELEM to the re_node_set* SET.
    SET should not already have ELEM.
-   return -1 if an error has occurred, return 1 otherwise.  */
+   return -1 if an error is occured, return 1 otherwise.  */
 
 static int
-internal_function
+internal_function __attribute_warn_unused_result__
 re_node_set_insert (re_node_set *set, int elem)
 {
   int idx;
@@ -1341,10 +1353,10 @@ re_node_set_insert (re_node_set *set, int elem)
 
 /* Insert the new element ELEM to the re_node_set* SET.
    SET should not already have any element greater than or equal to ELEM.
-   Return -1 if an error has occurred, return 1 otherwise.  */
+   Return -1 if an error is occured, return 1 otherwise.  */
 
 static int
-internal_function
+internal_function __attribute_warn_unused_result__
 re_node_set_insert_last (re_node_set *set, int elem)
 {
   /* Realloc if we need.  */
@@ -1367,7 +1379,7 @@ re_node_set_insert_last (re_node_set *set, int elem)
    return 1 if SET1 and SET2 are equivalent, return 0 otherwise.  */
 
 static int
-internal_function __attribute ((pure))
+internal_function __attribute__ ((__pure__))
 re_node_set_compare (const re_node_set *set1, const re_node_set *set2)
 {
   int i;
@@ -1382,7 +1394,7 @@ re_node_set_compare (const re_node_set *set1, const re_node_set *set2)
 /* Return (idx + 1) if SET contains the element ELEM, return 0 otherwise.  */
 
 static int
-internal_function __attribute ((pure))
+internal_function __attribute__ ((__pure__))
 re_node_set_contains (const re_node_set *set, int elem)
 {
   unsigned int idx, right, mid;
@@ -1416,7 +1428,7 @@ re_node_set_remove_at (re_node_set *set, int idx)
 \f
 
 /* Add the token TOKEN to dfa->nodes, and return the index of the token.
-   Or return -1, if an error has occurred.  */
+   Or return -1, if an error will be occured.  */
 
 static int
 internal_function
@@ -1446,7 +1458,18 @@ re_dfa_add_node (re_dfa_t *dfa, re_token_t token)
       new_eclosures = re_realloc (dfa->eclosures, re_node_set, new_nodes_alloc);
       if (BE (new_nexts == NULL || new_indices == NULL
 	      || new_edests == NULL || new_eclosures == NULL, 0))
-	return -1;
+        {
+	   /* if any are not NULL, free them, avoid leaks */
+	   if (new_nexts != NULL)
+              re_free(new_nexts);
+	   if (new_indices != NULL)
+              re_free(new_indices);
+	   if (new_edests != NULL)
+              re_free(new_edests);
+	   if (new_eclosures != NULL)
+              re_free(new_eclosures);
+	   return -1;
+	}
       dfa->nexts = new_nexts;
       dfa->org_indices = new_indices;
       dfa->edests = new_edests;
@@ -1486,7 +1509,7 @@ calc_state_hash (const re_node_set *nodes, unsigned int context)
 	   optimization.  */
 
 static re_dfastate_t *
-internal_function
+internal_function __attribute_warn_unused_result__
 re_acquire_state (reg_errcode_t *err, const re_dfa_t *dfa,
 		  const re_node_set *nodes)
 {
@@ -1530,7 +1553,7 @@ re_acquire_state (reg_errcode_t *err, const re_dfa_t *dfa,
 	   optimization.  */
 
 static re_dfastate_t *
-internal_function
+internal_function __attribute_warn_unused_result__
 re_acquire_state_context (reg_errcode_t *err, const re_dfa_t *dfa,
 			  const re_node_set *nodes, unsigned int context)
 {
@@ -1567,6 +1590,7 @@ re_acquire_state_context (reg_errcode_t *err, const re_dfa_t *dfa,
    indicates the error code if failed.  */
 
 static reg_errcode_t
+__attribute_warn_unused_result__
 register_state (const re_dfa_t *dfa, re_dfastate_t *newstate,
 		unsigned int hash)
 {
@@ -1617,11 +1641,11 @@ free_state (re_dfastate_t *state)
   re_free (state);
 }
 
-/* Create the new state which is independ of contexts.
+/* Create the new state which is independent of contexts.
    Return the new state if succeeded, otherwise return NULL.  */
 
 static re_dfastate_t *
-internal_function
+internal_function __attribute_warn_unused_result__
 create_ci_newstate (const re_dfa_t *dfa, const re_node_set *nodes,
 		    unsigned int hash)
 {
@@ -1671,7 +1695,7 @@ create_ci_newstate (const re_dfa_t *dfa, const re_node_set *nodes,
    Return the new state if succeeded, otherwise return NULL.  */
 
 static re_dfastate_t *
-internal_function
+internal_function __attribute_warn_unused_result__
 create_cd_newstate (const re_dfa_t *dfa, const re_node_set *nodes,
 		    unsigned int context, unsigned int hash)
 {
diff --git a/compat/regex/regex_internal.h b/compat/regex/regex_internal.h
index 4184d7f5a6..9c88a6a57b 100644
--- a/compat/regex/regex_internal.h
+++ b/compat/regex/regex_internal.h
@@ -1,5 +1,12 @@
+/*
+ * This is git.git's copy of gawk.git's regex engine. Please see that
+ * project for the latest version & to submit patches to this code,
+ * and git.git's compat/regex/README for information on how git's copy
+ * of this code is maintained.
+ */
+
 /* Extended regular expression matching and search library.
-   Copyright (C) 2002-2005, 2007, 2008, 2010 Free Software Foundation, Inc.
+   Copyright (C) 2002-2016 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Isamu Hasegawa <isamu@yamato.ibm.com>.
 
@@ -14,9 +21,8 @@
    Lesser General Public License for more details.
 
    You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
-   02111-1307 USA.  */
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
 
 #ifndef _REGEX_INTERNAL_H
 #define _REGEX_INTERNAL_H 1
@@ -42,13 +48,14 @@
 #if defined HAVE_STDBOOL_H || defined _LIBC
 # include <stdbool.h>
 #endif /* HAVE_STDBOOL_H || _LIBC */
-#if !defined(ZOS_USS)
 #if defined HAVE_STDINT_H || defined _LIBC
 # include <stdint.h>
 #endif /* HAVE_STDINT_H || _LIBC */
-#endif /* !ZOS_USS */
+ 
+#include "intprops.h"
+
 #if defined _LIBC
-# include <bits/libc-lock.h>
+# include <libc-lock.h>
 #else
 # define __libc_lock_define(CLASS,NAME)
 # define __libc_lock_init(NAME) do { } while (0)
@@ -80,7 +87,6 @@ is_blank (int c)
 # ifndef _RE_DEFINE_LOCALE_FUNCTIONS
 #  define _RE_DEFINE_LOCALE_FUNCTIONS 1
 #   include <locale/localeinfo.h>
-#   include <locale/elem-hash.h>
 #   include <locale/coll-lookup.h>
 # endif
 #endif
@@ -91,7 +97,7 @@ is_blank (int c)
 # ifdef _LIBC
 #  undef gettext
 #  define gettext(msgid) \
-  INTUSE(__dcgettext) (_libc_intl_domainname, msgid, LC_MESSAGES)
+  __dcgettext (_libc_intl_domainname, msgid, LC_MESSAGES)
 # endif
 #else
 # define gettext(msgid) (msgid)
@@ -108,14 +114,7 @@ is_blank (int c)
 # define SIZE_MAX ((size_t) -1)
 #endif
 
-#ifndef NO_MBSUPPORT
-#include "mbsupport.h" /* gawk */
-#endif
-#ifndef MB_CUR_MAX
-#define MB_CUR_MAX 1
-#endif
-
-#if (defined MBS_SUPPORT) || _LIBC
+#if ! defined(__DJGPP__) && (defined(GAWK) || _LIBC)
 # define RE_ENABLE_I18N
 #endif
 
@@ -154,17 +153,20 @@ is_blank (int c)
 # define __mempcpy mempcpy
 # define __wcrtomb wcrtomb
 # define __regfree regfree
-# define attribute_hidden
 #endif /* not _LIBC */
 
-#ifdef __GNUC__
-# define __attribute(arg) __attribute__ (arg)
-#else
-# define __attribute(arg)
+#if __GNUC__ < 3 + (__GNUC_MINOR__ < 1)
+# define __attribute__(arg)
 #endif
 
-extern const char __re_error_msgid[] attribute_hidden;
-extern const size_t __re_error_msgid_idx[] attribute_hidden;
+#ifdef GAWK
+/*
+ * Instead of trying to figure out which GCC version introduced
+ * this symbol, just define it out and be done.
+ */
+# undef __attribute_warn_unused_result__
+# define __attribute_warn_unused_result__
+#endif
 
 /* An integer used to represent a set of bits.  It must be unsigned,
    and must be at least as wide as unsigned int.  */
@@ -414,7 +416,7 @@ typedef struct re_dfa_t re_dfa_t;
 
 #ifndef _LIBC
 # ifdef __i386__
-#  define internal_function   __attribute ((regparm (3), stdcall))
+#  define internal_function   __attribute__ ((regparm (3), stdcall))
 # else
 #  define internal_function
 # endif
@@ -433,7 +435,7 @@ static void build_upper_buffer (re_string_t *pstr) internal_function;
 static void re_string_translate_buffer (re_string_t *pstr) internal_function;
 static unsigned int re_string_context_at (const re_string_t *input, int idx,
 					  int eflags)
-     internal_function __attribute ((pure));
+     internal_function __attribute__ ((pure));
 #endif
 #define re_string_peek_byte(pstr, offset) \
   ((pstr)->mbs[(pstr)->cur_idx + offset])
@@ -454,26 +456,50 @@ static unsigned int re_string_context_at (const re_string_t *input, int idx,
 
 #ifndef _LIBC
 # if HAVE_ALLOCA
-#  if (_MSC_VER)
-#   include <malloc.h>
-#   define __libc_use_alloca(n) 0
-#  else
-#   include <alloca.h>
+#  include <alloca.h>
 /* The OS usually guarantees only one guard page at the bottom of the stack,
    and a page size can be as small as 4096 bytes.  So we cannot safely
    allocate anything larger than 4096 bytes.  Also care for the possibility
    of a few compiler-allocated temporary stack slots.  */
 #  define __libc_use_alloca(n) ((n) < 4032)
-#  endif
 # else
 /* alloca is implemented with malloc, so just use malloc.  */
 #  define __libc_use_alloca(n) 0
 # endif
 #endif
 
+/*
+ * GAWK checks for zero-size allocations everywhere else,
+ * do it here too.
+ */
+#ifndef GAWK
 #define re_malloc(t,n) ((t *) malloc ((n) * sizeof (t)))
-/* SunOS 4.1.x realloc doesn't accept null pointers: pre-Standard C. Sigh. */
-#define re_realloc(p,t,n) ((p != NULL) ? (t *) realloc (p,(n)*sizeof(t)) : (t *) calloc(n,sizeof(t)))
+#define re_realloc(p,t,n) ((t *) realloc (p, (n) * sizeof (t)))
+#else
+static void *
+test_malloc(size_t count, const char *file, size_t line)
+{
+	if (count == 0) {
+		fprintf(stderr, "%s:%lu: allocation of zero bytes\n",
+				file, (unsigned long) line);
+		exit(1);
+	}
+	return malloc(count);
+}
+
+static void *
+test_realloc(void *p, size_t count, const char *file, size_t line)
+{
+	if (count == 0) {
+		fprintf(stderr, "%s:%lu: reallocation of zero bytes\n",
+				file, (unsigned long) line);
+		exit(1);
+	}
+	return realloc(p, count);
+}
+#define re_malloc(t,n) ((t *) test_malloc (((n) * sizeof (t)), __FILE__, __LINE__))
+#define re_realloc(p,t,n) ((t *) test_realloc (p, (n) * sizeof (t), __FILE__, __LINE__))
+#endif
 #define re_free(p) free (p)
 
 struct bin_tree_t
@@ -729,7 +755,7 @@ typedef struct
 
 
 /* Inline functions for bitset operation.  */
-static inline void
+static void __attribute__ ((unused))
 bitset_not (bitset_t set)
 {
   int bitset_i;
@@ -737,7 +763,7 @@ bitset_not (bitset_t set)
     set[bitset_i] = ~set[bitset_i];
 }
 
-static inline void
+static void __attribute__ ((unused))
 bitset_merge (bitset_t dest, const bitset_t src)
 {
   int bitset_i;
@@ -745,7 +771,7 @@ bitset_merge (bitset_t dest, const bitset_t src)
     dest[bitset_i] |= src[bitset_i];
 }
 
-static inline void
+static void __attribute__ ((unused))
 bitset_mask (bitset_t dest, const bitset_t src)
 {
   int bitset_i;
@@ -755,8 +781,8 @@ bitset_mask (bitset_t dest, const bitset_t src)
 
 #ifdef RE_ENABLE_I18N
 /* Inline functions for re_string.  */
-static inline int
-internal_function __attribute ((pure))
+static int
+internal_function __attribute__ ((pure, unused))
 re_string_char_size_at (const re_string_t *pstr, int idx)
 {
   int byte_idx;
@@ -768,8 +794,8 @@ re_string_char_size_at (const re_string_t *pstr, int idx)
   return byte_idx;
 }
 
-static inline wint_t
-internal_function __attribute ((pure))
+static wint_t
+internal_function __attribute__ ((pure, unused))
 re_string_wchar_at (const re_string_t *pstr, int idx)
 {
   if (pstr->mb_cur_max == 1)
@@ -778,15 +804,17 @@ re_string_wchar_at (const re_string_t *pstr, int idx)
 }
 
 # ifndef NOT_IN_libc
+#  ifdef _LIBC
+#   include <locale/weight.h>
+#  endif
+
 static int
-internal_function __attribute ((pure))
+internal_function __attribute__ ((pure, unused))
 re_string_elem_size_at (const re_string_t *pstr, int idx)
 {
 #  ifdef _LIBC
   const unsigned char *p, *extra;
   const int32_t *table, *indirect;
-  int32_t tmp;
-#   include <locale/weight.h>
   uint_fast32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES);
 
   if (nrules != 0)
@@ -797,7 +825,7 @@ re_string_elem_size_at (const re_string_t *pstr, int idx)
       indirect = (const int32_t *) _NL_CURRENT (LC_COLLATE,
 						_NL_COLLATE_INDIRECTMB);
       p = pstr->mbs + idx;
-      tmp = findidx (&p);
+      findidx (table, indirect, extra, &p, pstr->len - idx);
       return p - pstr->mbs - idx;
     }
   else
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/7] fixup! compat/regex: update the gawk regex engine from upstream
  2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2017-05-04 22:00 ` [PATCH 5/7] " Ævar Arnfjörð Bjarmason
@ 2017-05-04 22:00 ` Ævar Arnfjörð Bjarmason
  2017-05-04 22:00 ` [PATCH 7/7] " Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-04 22:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins,
	Ævar Arnfjörð Bjarmason

---
 compat/regex/regex.h   | 120 +++++++++++++-----------
 compat/regex/regexec.c | 242 +++++++++++++++++++++++++------------------------
 2 files changed, 193 insertions(+), 169 deletions(-)

diff --git a/compat/regex/regex.h b/compat/regex/regex.h
index 61c9683872..b602b5567f 100644
--- a/compat/regex/regex.h
+++ b/compat/regex/regex.h
@@ -1,10 +1,13 @@
-#include <stdio.h>
-#include <stddef.h>
+/*
+ * This is git.git's copy of gawk.git's regex engine. Please see that
+ * project for the latest version & to submit patches to this code,
+ * and git.git's compat/regex/README for information on how git's copy
+ * of this code is maintained.
+ */
 
 /* Definitions for data structures and routines for the regular
    expression library.
-   Copyright (C) 1985,1989-93,1995-98,2000,2001,2002,2003,2005,2006,2008
-   Free Software Foundation, Inc.
+   Copyright (C) 1985, 1989-2016 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -18,9 +21,8 @@
    Lesser General Public License for more details.
 
    You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
-   02110-1301 USA.  */
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
 
 #ifndef _REGEX_H
 #define _REGEX_H 1
@@ -75,10 +77,10 @@ typedef unsigned long int reg_syntax_t;
 /* If this bit is set, then ^ and $ are always anchors (outside bracket
      expressions, of course).
    If this bit is not set, then it depends:
-        ^  is an anchor if it is at the beginning of a regular
-           expression or after an open-group or an alternation operator;
-        $  is an anchor if it is at the end of a regular expression, or
-           before a close-group or an alternation operator.
+	^  is an anchor if it is at the beginning of a regular
+	   expression or after an open-group or an alternation operator;
+	$  is an anchor if it is at the end of a regular expression, or
+	   before a close-group or an alternation operator.
 
    This bit could be (re)combined with RE_CONTEXT_INDEP_OPS, because
    POSIX draft 11.2 says that * etc. in leading positions is undefined.
@@ -158,10 +160,18 @@ typedef unsigned long int reg_syntax_t;
    If not set, then the GNU regex operators are recognized. */
 # define RE_NO_GNU_OPS (RE_NO_POSIX_BACKTRACKING << 1)
 
+/* If this bit is set, turn on internal regex debugging.
+   If not set, and debugging was on, turn it off.
+   This only works if regex.c is compiled -DDEBUG.
+   We define this bit always, so that all that's needed to turn on
+   debugging is to recompile regex.c; the calling code can always have
+   this bit set, and it won't affect anything in the normal case. */
+# define RE_DEBUG (RE_NO_GNU_OPS << 1)
+
 /* If this bit is set, a syntactically invalid interval is treated as
    a string of ordinary characters.  For example, the ERE 'a{1' is
    treated as 'a\{1'.  */
-# define RE_INVALID_INTERVAL_ORD (RE_NO_GNU_OPS << 1)
+# define RE_INVALID_INTERVAL_ORD (RE_DEBUG << 1)
 
 /* If this bit is set, then ignore case when matching.
    If not set, then case is significant.  */
@@ -178,7 +188,7 @@ typedef unsigned long int reg_syntax_t;
 
 /* If this bit is set, then no_sub will be set to 1 during
    re_compile_pattern.  */
-#define RE_NO_SUB (RE_CONTEXT_INVALID_DUP << 1)
+# define RE_NO_SUB (RE_CONTEXT_INVALID_DUP << 1)
 #endif
 
 /* This global variable defines the particular regexp syntax to use (for
@@ -199,13 +209,14 @@ extern reg_syntax_t re_syntax_options;
    | RE_NO_BK_PARENS              | RE_NO_BK_REFS			\
    | RE_NO_BK_VBAR                | RE_NO_EMPTY_RANGES			\
    | RE_DOT_NEWLINE		  | RE_CONTEXT_INDEP_ANCHORS		\
+   | RE_CHAR_CLASSES							\
    | RE_UNMATCHED_RIGHT_PAREN_ORD | RE_NO_GNU_OPS)
 
 #define RE_SYNTAX_GNU_AWK						\
   ((RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS		\
-   | RE_INVALID_INTERVAL_ORD)						\
+    | RE_INVALID_INTERVAL_ORD)						\
    & ~(RE_DOT_NOT_NULL | RE_CONTEXT_INDEP_OPS				\
-       | RE_CONTEXT_INVALID_OPS ))
+      | RE_CONTEXT_INVALID_OPS ))
 
 #define RE_SYNTAX_POSIX_AWK						\
   (RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS		\
@@ -323,7 +334,7 @@ typedef enum
   /* POSIX regcomp return error codes.  (In the order listed in the
      standard.)  */
   REG_BADPAT,		/* Invalid pattern.  */
-  REG_ECOLLATE,		/* Inalid collating element.  */
+  REG_ECOLLATE,		/* Invalid collating element.  */
   REG_ECTYPE,		/* Invalid character class name.  */
   REG_EESCAPE,		/* Trailing backslash.  */
   REG_ESUBREG,		/* Invalid back reference.  */
@@ -343,9 +354,9 @@ typedef enum
 \f
 /* This data structure represents a compiled pattern.  Before calling
    the pattern compiler, the fields `buffer', `allocated', `fastmap',
-   `translate', and `no_sub' can be set.  After the pattern has been
-   compiled, the `re_nsub' field is available.  All other fields are
-   private to the regex routines.  */
+   and `translate' can be set.  After the pattern has been compiled,
+   the fields `re_nsub', `not_bol' and `not_eol' are available.  All
+   other fields are private to the regex routines.  */
 
 #ifndef RE_TRANSLATE_TYPE
 # define __RE_TRANSLATE_TYPE unsigned char *
@@ -466,19 +477,24 @@ typedef struct
 #ifdef __USE_GNU
 /* Sets the current default syntax to SYNTAX, and return the old syntax.
    You can also simply assign to the `re_syntax_options' variable.  */
-extern reg_syntax_t re_set_syntax (reg_syntax_t __syntax);
+extern reg_syntax_t re_set_syntax (reg_syntax_t syntax);
 
 /* Compile the regular expression PATTERN, with length LENGTH
    and syntax given by the global `re_syntax_options', into the buffer
-   BUFFER.  Return NULL if successful, and an error string if not.  */
-extern const char *re_compile_pattern (const char *__pattern, size_t __length,
-				       struct re_pattern_buffer *__buffer);
+   BUFFER.  Return NULL if successful, and an error string if not.
+
+   To free the allocated storage, you must call `regfree' on BUFFER.
+   Note that the translate table must either have been initialised by
+   `regcomp', with a malloc'ed value, or set to NULL before calling
+   `regfree'.  */
+extern const char *re_compile_pattern (const char *pattern, size_t length,
+				       struct re_pattern_buffer *buffer);
 
 
 /* Compile a fastmap for the compiled pattern in BUFFER; used to
    accelerate searches.  Return 0 if successful and -2 if was an
    internal error.  */
-extern int re_compile_fastmap (struct re_pattern_buffer *__buffer);
+extern int re_compile_fastmap (struct re_pattern_buffer *buffer);
 
 
 /* Search in the string STRING (with length LENGTH) for the pattern
@@ -486,30 +502,30 @@ extern int re_compile_fastmap (struct re_pattern_buffer *__buffer);
    characters.  Return the starting position of the match, -1 for no
    match, or -2 for an internal error.  Also return register
    information in REGS (if REGS and BUFFER->no_sub are nonzero).  */
-extern int re_search (struct re_pattern_buffer *__buffer, const char *__cstring,
-		      int __length, int __start, int __range,
-		      struct re_registers *__regs);
+extern int re_search (struct re_pattern_buffer *buffer, const char *c_string,
+		      int length, int start, int range,
+		      struct re_registers *regs);
 
 
 /* Like `re_search', but search in the concatenation of STRING1 and
    STRING2.  Also, stop searching at index START + STOP.  */
-extern int re_search_2 (struct re_pattern_buffer *__buffer,
-			const char *__string1, int __length1,
-			const char *__string2, int __length2, int __start,
-			int __range, struct re_registers *__regs, int __stop);
+extern int re_search_2 (struct re_pattern_buffer *buffer,
+			const char *string1, int length1,
+			const char *string2, int length2, int start,
+			int range, struct re_registers *regs, int stop);
 
 
 /* Like `re_search', but return how many characters in STRING the regexp
    in BUFFER matched, starting at position START.  */
-extern int re_match (struct re_pattern_buffer *__buffer, const char *__cstring,
-		     int __length, int __start, struct re_registers *__regs);
+extern int re_match (struct re_pattern_buffer *buffer, const char *c_string,
+		     int length, int start, struct re_registers *regs);
 
 
 /* Relates to `re_match' as `re_search_2' relates to `re_search'.  */
-extern int re_match_2 (struct re_pattern_buffer *__buffer,
-		       const char *__string1, int __length1,
-		       const char *__string2, int __length2, int __start,
-		       struct re_registers *__regs, int __stop);
+extern int re_match_2 (struct re_pattern_buffer *buffer,
+		       const char *string1, int length1,
+		       const char *string2, int length2, int start,
+		       struct re_registers *regs, int stop);
 
 
 /* Set REGS to hold NUM_REGS registers, storing them in STARTS and
@@ -524,13 +540,13 @@ extern int re_match_2 (struct re_pattern_buffer *__buffer,
    Unless this function is called, the first search or match using
    PATTERN_BUFFER will allocate its own register data, without
    freeing the old data.  */
-extern void re_set_registers (struct re_pattern_buffer *__buffer,
-			      struct re_registers *__regs,
-			      unsigned int __num_regs,
-			      regoff_t *__starts, regoff_t *__ends);
+extern void re_set_registers (struct re_pattern_buffer *buffer,
+			      struct re_registers *regs,
+			      unsigned int num_regs,
+			      regoff_t *starts, regoff_t *ends);
 #endif	/* Use GNU */
 
-#if defined _REGEX_RE_COMP || (defined _LIBC && defined __USE_BSD)
+#if defined _REGEX_RE_COMP || (defined _LIBC && defined __USE_MISC)
 # ifndef _CRAY
 /* 4.2 bsd compatibility.  */
 extern char *re_comp (const char *);
@@ -560,19 +576,19 @@ extern int re_exec (const char *);
 #endif
 
 /* POSIX compatibility.  */
-extern int regcomp (regex_t *__restrict __preg,
-		    const char *__restrict __pattern,
-		    int __cflags);
+extern int regcomp (regex_t *__restrict preg,
+		    const char *__restrict pattern,
+		    int cflags);
 
-extern int regexec (const regex_t *__restrict __preg,
-		    const char *__restrict __cstring, size_t __nmatch,
-		    regmatch_t __pmatch[__restrict_arr],
-		    int __eflags);
+extern int regexec (const regex_t *__restrict preg,
+		    const char *__restrict c_string, size_t nmatch,
+		    regmatch_t pmatch[__restrict_arr],
+		    int eflags);
 
-extern size_t regerror (int __errcode, const regex_t *__restrict __preg,
-			char *__restrict __errbuf, size_t __errbuf_size);
+extern size_t regerror (int errcode, const regex_t *__restrict preg,
+			char *__restrict errbuf, size_t errbuf_size);
 
-extern void regfree (regex_t *__preg);
+extern void regfree (regex_t *preg);
 
 
 #ifdef __cplusplus
diff --git a/compat/regex/regexec.c b/compat/regex/regexec.c
index eb5e1d4439..c79ff38b1c 100644
--- a/compat/regex/regexec.c
+++ b/compat/regex/regexec.c
@@ -1,5 +1,12 @@
+/*
+ * This is git.git's copy of gawk.git's regex engine. Please see that
+ * project for the latest version & to submit patches to this code,
+ * and git.git's compat/regex/README for information on how git's copy
+ * of this code is maintained.
+ */
+
 /* Extended regular expression matching and search library.
-   Copyright (C) 2002-2005, 2007, 2009, 2010 Free Software Foundation, Inc.
+   Copyright (C) 2002-2016 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Isamu Hasegawa <isamu@yamato.ibm.com>.
 
@@ -14,9 +21,12 @@
    Lesser General Public License for more details.
 
    You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
-   02110-1301 USA.  */
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef HAVE_STDINT_H
+#include <stdint.h>
+#endif /* HAVE_STDINT_H */
 
 static reg_errcode_t match_ctx_init (re_match_context_t *cache, int eflags,
 				     int n) internal_function;
@@ -40,18 +50,18 @@ static reg_errcode_t re_search_internal (const regex_t *preg,
 					 const char *string, int length,
 					 int start, int range, int stop,
 					 size_t nmatch, regmatch_t pmatch[],
-					 int eflags);
+					 int eflags) internal_function;
 static int re_search_2_stub (struct re_pattern_buffer *bufp,
 			     const char *string1, int length1,
 			     const char *string2, int length2,
 			     int start, int range, struct re_registers *regs,
-			     int stop, int ret_len);
+			     int stop, int ret_len) internal_function;
 static int re_search_stub (struct re_pattern_buffer *bufp,
 			   const char *string, int length, int start,
 			   int range, int stop, struct re_registers *regs,
-			   int ret_len);
+			   int ret_len) internal_function;
 static unsigned re_copy_regs (struct re_registers *regs, regmatch_t *pmatch,
-			      int nregs, int regs_allocated);
+			      int nregs, int regs_allocated) internal_function;
 static reg_errcode_t prune_impossible_nodes (re_match_context_t *mctx);
 static int check_matching (re_match_context_t *mctx, int fl_longest_match,
 			   int *p_match_first) internal_function;
@@ -197,8 +207,17 @@ static int group_nodes_into_DFAstates (const re_dfa_t *dfa,
 static int check_node_accept (const re_match_context_t *mctx,
 			      const re_token_t *node, int idx)
      internal_function;
-static reg_errcode_t extend_buffers (re_match_context_t *mctx)
+static reg_errcode_t extend_buffers (re_match_context_t *mctx, int min_len)
      internal_function;
+
+#ifdef GAWK
+#undef MIN	/* safety */
+static int
+MIN(size_t a, size_t b)
+{
+	return (a < b ? a : b);
+}
+#endif
 \f
 /* Entry point for POSIX code.  */
 
@@ -217,12 +236,8 @@ static reg_errcode_t extend_buffers (re_match_context_t *mctx)
    We return 0 if we find a match and REG_NOMATCH if not.  */
 
 int
-regexec (
-	const regex_t *__restrict preg,
-	const char *__restrict string,
-	size_t nmatch,
-	regmatch_t pmatch[],
-	int eflags)
+regexec (const regex_t *__restrict preg, const char *__restrict string,
+	 size_t nmatch, regmatch_t pmatch[], int eflags)
 {
   reg_errcode_t err;
   int start, length;
@@ -293,7 +308,7 @@ compat_symbol (libc, __compat_regexec, regexec, GLIBC_2_0);
    concerned.
 
    If REGS is not NULL, and BUFP->no_sub is not set, the offsets of the match
-   and all groups is stroed in REGS.  (For the "_2" variants, the offsets are
+   and all groups is stored in REGS.  (For the "_2" variants, the offsets are
    computed relative to the concatenation, not relative to the individual
    strings.)
 
@@ -302,11 +317,8 @@ compat_symbol (libc, __compat_regexec, regexec, GLIBC_2_0);
    match was found and -2 indicates an internal error.  */
 
 int
-re_match (struct re_pattern_buffer *bufp,
-	  const char *string,
-	  int length,
-	  int start,
-	  struct re_registers *regs)
+re_match (struct re_pattern_buffer *bufp, const char *string, int length,
+	  int start, struct re_registers *regs)
 {
   return re_search_stub (bufp, string, length, start, 0, length, regs, 1);
 }
@@ -315,10 +327,8 @@ weak_alias (__re_match, re_match)
 #endif
 
 int
-re_search (struct re_pattern_buffer *bufp,
-	   const char *string,
-	   int length, int start, int range,
-	   struct re_registers *regs)
+re_search (struct re_pattern_buffer *bufp, const char *string, int length,
+	   int start, int range, struct re_registers *regs)
 {
   return re_search_stub (bufp, string, length, start, range, length, regs, 0);
 }
@@ -327,8 +337,7 @@ weak_alias (__re_search, re_search)
 #endif
 
 int
-re_match_2 (struct re_pattern_buffer *bufp,
-	    const char *string1, int length1,
+re_match_2 (struct re_pattern_buffer *bufp, const char *string1, int length1,
 	    const char *string2, int length2, int start,
 	    struct re_registers *regs, int stop)
 {
@@ -340,10 +349,9 @@ weak_alias (__re_match_2, re_match_2)
 #endif
 
 int
-re_search_2 (struct re_pattern_buffer *bufp,
-	     const char *string1, int length1,
-	     const char *string2, int length2, int start,
-	     int range, struct re_registers *regs,  int stop)
+re_search_2 (struct re_pattern_buffer *bufp, const char *string1, int length1,
+	     const char *string2, int length2, int start, int range,
+	     struct re_registers *regs, int stop)
 {
   return re_search_2_stub (bufp, string1, length1, string2, length2,
 			   start, range, regs, stop, 0);
@@ -353,32 +361,37 @@ weak_alias (__re_search_2, re_search_2)
 #endif
 
 static int
-re_search_2_stub (struct re_pattern_buffer *bufp,
-		  const char *string1, int length1,
-		  const char *string2, int length2, int start,
+internal_function
+re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
+		  int length1, const char *string2, int length2, int start,
 		  int range, struct re_registers *regs,
 		  int stop, int ret_len)
 {
   const char *str;
   int rval;
-  int len = length1 + length2;
-  int free_str = 0;
+  int len;
+  char *s = NULL;
 
-  if (BE (length1 < 0 || length2 < 0 || stop < 0, 0))
+  if (BE ((length1 < 0 || length2 < 0 || stop < 0
+           || INT_ADD_WRAPV (length1, length2, &len)),
+          0))
     return -2;
 
   /* Concatenate the strings.  */
   if (length2 > 0)
     if (length1 > 0)
       {
-	char *s = re_malloc (char, len);
+	s = re_malloc (char, len);
 
 	if (BE (s == NULL, 0))
 	  return -2;
+#ifdef _LIBC
+	memcpy (__mempcpy (s, string1, length1), string2, length2);
+#else
 	memcpy (s, string1, length1);
 	memcpy (s + length1, string2, length2);
+#endif
 	str = s;
-	free_str = 1;
       }
     else
       str = string2;
@@ -386,8 +399,7 @@ re_search_2_stub (struct re_pattern_buffer *bufp,
     str = string1;
 
   rval = re_search_stub (bufp, str, len, start, range, stop, regs, ret_len);
-  if (free_str)
-    re_free ((char *) str);
+  re_free (s);
   return rval;
 }
 
@@ -397,10 +409,10 @@ re_search_2_stub (struct re_pattern_buffer *bufp,
    otherwise the position of the match is returned.  */
 
 static int
-re_search_stub (struct re_pattern_buffer *bufp,
-		const char *string, int length, int start,
-		int range, int stop,
-		struct re_registers *regs, int ret_len)
+internal_function
+re_search_stub (struct re_pattern_buffer *bufp, const char *string, int length,
+		int start, int range, int stop, struct re_registers *regs,
+		int ret_len)
 {
   reg_errcode_t result;
   regmatch_t *pmatch;
@@ -455,7 +467,7 @@ re_search_stub (struct re_pattern_buffer *bufp,
 
   rval = 0;
 
-  /* I hope we needn't fill their regs with -1's when no match was found.  */
+  /* I hope we needn't fill ther regs with -1's when no match was found.  */
   if (result != REG_NOERROR)
     rval = -1;
   else if (regs != NULL)
@@ -484,9 +496,9 @@ re_search_stub (struct re_pattern_buffer *bufp,
 }
 
 static unsigned
-re_copy_regs (struct re_registers *regs,
-	      regmatch_t *pmatch,
-	      int nregs, int regs_allocated)
+internal_function
+re_copy_regs (struct re_registers *regs, regmatch_t *pmatch, int nregs,
+	      int regs_allocated)
 {
   int rval = REGS_REALLOCATE;
   int i;
@@ -563,11 +575,8 @@ re_copy_regs (struct re_registers *regs,
    freeing the old data.  */
 
 void
-re_set_registers (struct re_pattern_buffer *bufp,
-		  struct re_registers *regs,
-		  unsigned num_regs,
-		  regoff_t *starts,
-		  regoff_t *ends)
+re_set_registers (struct re_pattern_buffer *bufp, struct re_registers *regs,
+		  unsigned num_regs, regoff_t *starts, regoff_t *ends)
 {
   if (num_regs)
     {
@@ -595,8 +604,7 @@ int
 # ifdef _LIBC
 weak_function
 # endif
-re_exec (s)
-     const char *s;
+re_exec (const char *s)
 {
   return 0 == regexec (&re_comp_buf, s, 0, NULL, 0);
 }
@@ -606,7 +614,7 @@ re_exec (s)
 
 /* Searches for a compiled pattern PREG in the string STRING, whose
    length is LENGTH.  NMATCH, PMATCH, and EFLAGS have the same
-   mingings with regexec.  START, and RANGE have the same meanings
+   meaning as with regexec.  START, and RANGE have the same meanings
    with re_search.
    Return REG_NOERROR if we find a match, and REG_NOMATCH if not,
    otherwise return the error code.
@@ -614,11 +622,10 @@ re_exec (s)
    (START + RANGE >= 0 && START + RANGE <= LENGTH)  */
 
 static reg_errcode_t
-re_search_internal (const regex_t *preg,
-		    const char *string,
-		    int length, int start, int range, int stop,
-		    size_t nmatch, regmatch_t pmatch[],
-		    int eflags)
+__attribute_warn_unused_result__ internal_function
+re_search_internal (const regex_t *preg, const char *string, int length,
+		    int start, int range, int stop, size_t nmatch,
+		    regmatch_t pmatch[], int eflags)
 {
   reg_errcode_t err;
   const re_dfa_t *dfa = (const re_dfa_t *) preg->buffer;
@@ -644,7 +651,7 @@ re_search_internal (const regex_t *preg,
   nmatch -= extra_nmatch;
 
   /* Check if the DFA haven't been compiled.  */
-  if (BE (preg->used == 0 || dfa->init_state == NULL
+  if (BE (preg->used == 0 || dfa == NULL || dfa->init_state == NULL
 	  || dfa->init_state_word == NULL || dfa->init_state_nl == NULL
 	  || dfa->init_state_begbuf == NULL, 0))
     return REG_NOMATCH;
@@ -671,7 +678,8 @@ re_search_internal (const regex_t *preg,
   fl_longest_match = (nmatch != 0 || dfa->nbackref);
 
   err = re_string_allocate (&mctx.input, string, length, dfa->nodes_len + 1,
-			    preg->translate, preg->syntax & RE_ICASE, dfa);
+			    preg->translate, (preg->syntax & RE_ICASE) != 0,
+			    dfa);
   if (BE (err != REG_NOERROR, 0))
     goto free_return;
   mctx.input.stop = stop;
@@ -888,7 +896,7 @@ re_search_internal (const regex_t *preg,
 	    goto free_return;
 	}
 
-      /* At last, add the offset to the each registers, since we slided
+      /* At last, add the offset to each register, since we slid
 	 the buffers so that we could assume that the matching starts
 	 from 0.  */
       for (reg_idx = 0; reg_idx < nmatch; ++reg_idx)
@@ -1033,7 +1041,7 @@ prune_impossible_nodes (re_match_context_t *mctx)
    since initial states may have constraints like "\<", "^", etc..  */
 
 static inline re_dfastate_t *
-__attribute ((always_inline)) internal_function
+__attribute__ ((always_inline)) internal_function
 acquire_init_state_context (reg_errcode_t *err, const re_match_context_t *mctx,
 			    int idx)
 {
@@ -1071,11 +1079,11 @@ acquire_init_state_context (reg_errcode_t *err, const re_match_context_t *mctx,
    FL_LONGEST_MATCH means we want the POSIX longest matching.
    If P_MATCH_FIRST is not NULL, and the match fails, it is set to the
    next place where we may want to try matching.
-   Note that the matcher assume that the matching starts from the current
+   Note that the matcher assume that the maching starts from the current
    index of the buffer.  */
 
 static int
-internal_function
+internal_function __attribute_warn_unused_result__
 check_matching (re_match_context_t *mctx, int fl_longest_match,
 		int *p_match_first)
 {
@@ -1140,11 +1148,12 @@ check_matching (re_match_context_t *mctx, int fl_longest_match,
       re_dfastate_t *old_state = cur_state;
       int next_char_idx = re_string_cur_idx (&mctx->input) + 1;
 
-      if (BE (next_char_idx >= mctx->input.bufs_len, 0)
+      if ((BE (next_char_idx >= mctx->input.bufs_len, 0)
+	   && mctx->input.bufs_len < mctx->input.len)
 	  || (BE (next_char_idx >= mctx->input.valid_len, 0)
 	      && mctx->input.valid_len < mctx->input.len))
 	{
-	  err = extend_buffers (mctx);
+	  err = extend_buffers (mctx, next_char_idx + 1);
 	  if (BE (err != REG_NOERROR, 0))
 	    {
 	      assert (err == REG_ESPACE);
@@ -1348,7 +1357,7 @@ proceed_next_node (const re_match_context_t *mctx, int nregs, regmatch_t *regs,
 }
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 push_fail_stack (struct re_fail_stack_t *fs, int str_idx, int dest_node,
 		 int nregs, regmatch_t *regs, re_node_set *eps_via_nodes)
 {
@@ -1395,7 +1404,7 @@ pop_fail_stack (struct re_fail_stack_t *fs, int *pidx, int nregs,
    pmatch[i].rm_so == pmatch[i].rm_eo == -1 for 0 < i < nmatch.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
 	  regmatch_t *pmatch, int fl_backtrack)
 {
@@ -1651,7 +1660,7 @@ sift_states_backward (const re_match_context_t *mctx, re_sift_context_t *sctx)
 }
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 build_sifted_states (const re_match_context_t *mctx, re_sift_context_t *sctx,
 		     int str_idx, re_node_set *cur_dest)
 {
@@ -1718,12 +1727,13 @@ clean_state_log_if_needed (re_match_context_t *mctx, int next_state_log_idx)
 {
   int top = mctx->state_log_top;
 
-  if (next_state_log_idx >= mctx->input.bufs_len
+  if ((next_state_log_idx >= mctx->input.bufs_len
+       && mctx->input.bufs_len < mctx->input.len)
       || (next_state_log_idx >= mctx->input.valid_len
 	  && mctx->input.valid_len < mctx->input.len))
     {
       reg_errcode_t err;
-      err = extend_buffers (mctx);
+      err = extend_buffers (mctx, next_state_log_idx + 1);
       if (BE (err != REG_NOERROR, 0))
 	return err;
     }
@@ -1813,7 +1823,7 @@ update_cur_sifted_state (const re_match_context_t *mctx,
 }
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 add_epsilon_src_nodes (const re_dfa_t *dfa, re_node_set *dest_nodes,
 		       const re_node_set *candidates)
 {
@@ -2126,7 +2136,7 @@ check_subexp_limits (const re_dfa_t *dfa, re_node_set *dest_nodes,
 }
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 sift_states_bkref (const re_match_context_t *mctx, re_sift_context_t *sctx,
 		   int str_idx, const re_node_set *candidates)
 {
@@ -2239,7 +2249,7 @@ sift_states_iter_mb (const re_match_context_t *mctx, re_sift_context_t *sctx,
 			    dfa->nexts[node_idx]))
     /* The node can't accept the `multi byte', or the
        destination was already thrown away, then the node
-       couldn't accept the current input `multi byte'.   */
+       could't accept the current input `multi byte'.   */
     naccepted = 0;
   /* Otherwise, it is sure that the node could accept
      `naccepted' bytes input.  */
@@ -2256,7 +2266,7 @@ sift_states_iter_mb (const re_match_context_t *mctx, re_sift_context_t *sctx,
    update the destination of STATE_LOG.  */
 
 static re_dfastate_t *
-internal_function
+internal_function __attribute_warn_unused_result__
 transit_state (reg_errcode_t *err, re_match_context_t *mctx,
 	       re_dfastate_t *state)
 {
@@ -2313,7 +2323,7 @@ transit_state (reg_errcode_t *err, re_match_context_t *mctx,
 }
 
 /* Update the state_log if we need */
-static re_dfastate_t *
+re_dfastate_t *
 internal_function
 merge_state_with_log (reg_errcode_t *err, re_match_context_t *mctx,
 		      re_dfastate_t *next_state)
@@ -2326,7 +2336,7 @@ merge_state_with_log (reg_errcode_t *err, re_match_context_t *mctx,
       mctx->state_log[cur_idx] = next_state;
       mctx->state_log_top = cur_idx;
     }
-  else if (mctx->state_log[cur_idx] == NULL)
+  else if (mctx->state_log[cur_idx] == 0)
     {
       mctx->state_log[cur_idx] = next_state;
     }
@@ -2421,7 +2431,7 @@ find_recover_state (reg_errcode_t *err, re_match_context_t *mctx)
 /* From the node set CUR_NODES, pick up the nodes whose types are
    OP_OPEN_SUBEXP and which have corresponding back references in the regular
    expression. And register them to use them later for evaluating the
-   correspoding back references.  */
+   corresponding back references.  */
 
 static reg_errcode_t
 internal_function
@@ -2681,7 +2691,7 @@ transit_state_bkref (re_match_context_t *mctx, const re_node_set *nodes)
    delay these checking for prune_impossible_nodes().  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 get_subexp (re_match_context_t *mctx, int bkref_node, int bkref_str_idx)
 {
   const re_dfa_t *const dfa = mctx->dfa;
@@ -2777,7 +2787,7 @@ get_subexp (re_match_context_t *mctx, int bkref_node, int bkref_str_idx)
 		  if (bkref_str_off >= mctx->input.len)
 		    break;
 
-		  err = extend_buffers (mctx);
+		  err = extend_buffers (mctx, bkref_str_off + 1);
 		  if (BE (err != REG_NOERROR, 0))
 		    return err;
 
@@ -2881,7 +2891,7 @@ find_subexp_node (const re_dfa_t *dfa, const re_node_set *nodes,
    Return REG_NOERROR if it can arrive, or REG_NOMATCH otherwise.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 check_arrival (re_match_context_t *mctx, state_array_t *path, int top_node,
 	       int top_str, int last_node, int last_str, int type)
 {
@@ -3042,7 +3052,7 @@ check_arrival (re_match_context_t *mctx, state_array_t *path, int top_node,
 	 Can't we unify them?  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 check_arrival_add_next_nodes (re_match_context_t *mctx, int str_idx,
 			      re_node_set *cur_nodes, re_node_set *next_nodes)
 {
@@ -3176,7 +3186,7 @@ check_arrival_expand_ecl (const re_dfa_t *dfa, re_node_set *cur_nodes,
    problematic append it to DST_NODES.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 check_arrival_expand_ecl_sub (const re_dfa_t *dfa, re_node_set *dst_nodes,
 			      int target, int ex_subexp, int type)
 {
@@ -3220,7 +3230,7 @@ check_arrival_expand_ecl_sub (const re_dfa_t *dfa, re_node_set *dst_nodes,
    in MCTX->BKREF_ENTS.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 expand_bkref_cache (re_match_context_t *mctx, re_node_set *cur_nodes,
 		    int cur_str, int subexp_num, int type)
 {
@@ -3357,12 +3367,14 @@ build_trtable (const re_dfa_t *dfa, re_dfastate_t *state)
   if (BE (ndests <= 0, 0))
     {
       if (dests_node_malloced)
-	free (dests_alloc);
+	re_free (dests_alloc);
       /* Return 0 in case of an error, 1 otherwise.  */
       if (ndests == 0)
 	{
 	  state->trtable = (re_dfastate_t **)
 	    calloc (sizeof (re_dfastate_t *), SBC_MAX);
+	  if (BE (state->trtable == NULL, 0))
+	    return 0;
 	  return 1;
 	}
       return 0;
@@ -3387,18 +3399,18 @@ build_trtable (const re_dfa_t *dfa, re_dfastate_t *state)
   else
 #endif
     {
-      dest_states = (re_dfastate_t **)
-	malloc (ndests * 3 * sizeof (re_dfastate_t *));
+      dest_states =
+	re_malloc (re_dfastate_t *, ndests * 3);
       if (BE (dest_states == NULL, 0))
 	{
 out_free:
 	  if (dest_states_malloced)
-	    free (dest_states);
+	    re_free (dest_states);
 	  re_node_set_free (&follows);
 	  for (i = 0; i < ndests; ++i)
 	    re_node_set_free (dests_node + i);
 	  if (dests_node_malloced)
-	    free (dests_alloc);
+	    re_free (dests_alloc);
 	  return 0;
 	}
       dest_states_malloced = true;
@@ -3529,14 +3541,14 @@ build_trtable (const re_dfa_t *dfa, re_dfastate_t *state)
     }
 
   if (dest_states_malloced)
-    free (dest_states);
+    re_free (dest_states);
 
   re_node_set_free (&follows);
   for (i = 0; i < ndests; ++i)
     re_node_set_free (dests_node + i);
 
   if (dests_node_malloced)
-    free (dests_alloc);
+    re_free (dests_alloc);
 
   return 1;
 }
@@ -3736,6 +3748,10 @@ group_nodes_into_DFAstates (const re_dfa_t *dfa, const re_dfastate_t *state,
    one collating element like '.', '[a-z]', opposite to the other nodes
    can only accept one byte.  */
 
+# ifdef _LIBC
+#  include <locale/weight.h>
+# endif
+
 static int
 internal_function
 check_node_accept_bytes (const re_dfa_t *dfa, int node_idx,
@@ -3857,8 +3873,6 @@ check_node_accept_bytes (const re_dfa_t *dfa, int node_idx,
 	  const int32_t *table, *indirect;
 	  const unsigned char *weights, *extra;
 	  const char *collseqwc;
-	  /* This #include defines a local function!  */
-#  include <locale/weight.h>
 
 	  /* match with collating_symbol?  */
 	  if (cset->ncoll_syms)
@@ -3914,7 +3928,7 @@ check_node_accept_bytes (const re_dfa_t *dfa, int node_idx,
 		_NL_CURRENT (LC_COLLATE, _NL_COLLATE_EXTRAMB);
 	      indirect = (const int32_t *)
 		_NL_CURRENT (LC_COLLATE, _NL_COLLATE_INDIRECTMB);
-	      int32_t idx = findidx (&cp);
+	      int32_t idx = findidx (table, indirect, extra, &cp, elem_len);
 	      if (idx > 0)
 		for (i = 0; i < cset->nequiv_classes; ++i)
 		  {
@@ -3945,18 +3959,10 @@ check_node_accept_bytes (const re_dfa_t *dfa, int node_idx,
 # endif /* _LIBC */
 	{
 	  /* match with range expression?  */
-#if __GNUC__ >= 2
-	  wchar_t cmp_buf[] = {L'\0', L'\0', wc, L'\0', L'\0', L'\0'};
-#else
-	  wchar_t cmp_buf[] = {L'\0', L'\0', L'\0', L'\0', L'\0', L'\0'};
-	  cmp_buf[2] = wc;
-#endif
 	  for (i = 0; i < cset->nranges; ++i)
 	    {
-	      cmp_buf[0] = cset->range_starts[i];
-	      cmp_buf[4] = cset->range_ends[i];
-	      if (wcscoll (cmp_buf, cmp_buf + 2) <= 0
-		  && wcscoll (cmp_buf + 2, cmp_buf + 4) <= 0)
+              if (cset->range_starts[i] <= wc
+                  && wc <= cset->range_ends[i])
 		{
 		  match_len = char_len;
 		  goto check_node_accept_bytes_match;
@@ -4025,7 +4031,7 @@ find_collation_sequence_value (const unsigned char *mbs, size_t mbs_len)
 	  /* Skip the collation sequence value.  */
 	  idx += sizeof (uint32_t);
 	  /* Skip the wide char sequence of the collating element.  */
-	  idx = idx + sizeof (uint32_t) * (extra[idx] + 1);
+	  idx = idx + sizeof (uint32_t) * (*(int32_t *) (extra + idx) + 1);
 	  /* If we found the entry, return the sequence value.  */
 	  if (found)
 	    return *(uint32_t *) (extra + idx);
@@ -4092,8 +4098,8 @@ check_node_accept (const re_match_context_t *mctx, const re_token_t *node,
 /* Extend the buffers, if the buffers have run out.  */
 
 static reg_errcode_t
-internal_function
-extend_buffers (re_match_context_t *mctx)
+internal_function __attribute_warn_unused_result__
+extend_buffers (re_match_context_t *mctx, int min_len)
 {
   reg_errcode_t ret;
   re_string_t *pstr = &mctx->input;
@@ -4102,8 +4108,10 @@ extend_buffers (re_match_context_t *mctx)
   if (BE (INT_MAX / 2 / sizeof (re_dfastate_t *) <= pstr->bufs_len, 0))
     return REG_ESPACE;
 
-  /* Double the lengthes of the buffers.  */
-  ret = re_string_realloc_buffers (pstr, pstr->bufs_len * 2);
+  /* Double the lengthes of the buffers, but allocate at least MIN_LEN.  */
+  ret = re_string_realloc_buffers (pstr,
+				   MAX (min_len,
+					MIN (pstr->len, pstr->bufs_len * 2)));
   if (BE (ret != REG_NOERROR, 0))
     return ret;
 
@@ -4155,7 +4163,7 @@ extend_buffers (re_match_context_t *mctx)
 /* Initialize MCTX.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 match_ctx_init (re_match_context_t *mctx, int eflags, int n)
 {
   mctx->eflags = eflags;
@@ -4203,7 +4211,7 @@ match_ctx_clean (re_match_context_t *mctx)
 	  re_free (top->path->array);
 	  re_free (top->path);
 	}
-      free (top);
+      re_free (top);
     }
 
   mctx->nsub_tops = 0;
@@ -4228,7 +4236,7 @@ match_ctx_free (re_match_context_t *mctx)
 */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 match_ctx_add_entry (re_match_context_t *mctx, int node, int str_idx, int from,
 		     int to)
 {
@@ -4300,7 +4308,7 @@ search_cur_bkref_entry (const re_match_context_t *mctx, int str_idx)
    at STR_IDX.  */
 
 static reg_errcode_t
-internal_function
+internal_function __attribute_warn_unused_result__
 match_ctx_add_subtop (re_match_context_t *mctx, int node, int str_idx)
 {
 #ifdef DEBUG
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/7] fixup! compat/regex: update the gawk regex engine from upstream
  2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2017-05-04 22:00 ` [PATCH 6/7] " Ævar Arnfjörð Bjarmason
@ 2017-05-04 22:00 ` Ævar Arnfjörð Bjarmason
  2017-05-08  0:55 ` [PATCH 0/7] Update the compat/regex " Junio C Hamano
  2017-05-12  0:31 ` Junio C Hamano
  8 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-04 22:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins,
	Ævar Arnfjörð Bjarmason

---
 compat/regex/intprops.h | 448 ++++++++++++++++++++++++++++++++++++++++++++++++
 compat/regex/verify.h   | 286 +++++++++++++++++++++++++++++++
 2 files changed, 734 insertions(+)
 create mode 100644 compat/regex/intprops.h
 create mode 100644 compat/regex/verify.h

diff --git a/compat/regex/intprops.h b/compat/regex/intprops.h
new file mode 100644
index 0000000000..29f7f40837
--- /dev/null
+++ b/compat/regex/intprops.h
@@ -0,0 +1,448 @@
+/*
+ * This is git.git's copy of gawk.git's regex engine. Please see that
+ * project for the latest version & to submit patches to this code,
+ * and git.git's compat/regex/README for information on how git's copy
+ * of this code is maintained.
+ */
+
+/* intprops.h -- properties of integer types
+
+   Copyright (C) 2001-2016 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify it
+   under the terms of the GNU Lesser General Public License as published
+   by the Free Software Foundation; either version 2.1 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* Written by Paul Eggert.  */
+
+#ifndef _GL_INTPROPS_H
+#define _GL_INTPROPS_H
+
+#include <limits.h>
+
+#ifndef __has_builtin
+# define __has_builtin(x) 0
+#endif
+
+/* Return a value with the common real type of E and V and the value of V.  */
+#define _GL_INT_CONVERT(e, v) (0 * (e) + (v))
+
+/* Act like _GL_INT_CONVERT (E, -V) but work around a bug in IRIX 6.5 cc; see
+   <http://lists.gnu.org/archive/html/bug-gnulib/2011-05/msg00406.html>.  */
+#define _GL_INT_NEGATE_CONVERT(e, v) (0 * (e) - (v))
+
+/* The extra casts in the following macros work around compiler bugs,
+   e.g., in Cray C 5.0.3.0.  */
+
+/* True if the arithmetic type T is an integer type.  bool counts as
+   an integer.  */
+#define TYPE_IS_INTEGER(t) ((t) 1.5 == 1)
+
+/* True if the real type T is signed.  */
+#define TYPE_SIGNED(t) (! ((t) 0 < (t) -1))
+
+/* Return 1 if the real expression E, after promotion, has a
+   signed or floating type.  */
+#define EXPR_SIGNED(e) (_GL_INT_NEGATE_CONVERT (e, 1) < 0)
+
+
+/* Minimum and maximum values for integer types and expressions.  */
+
+/* The width in bits of the integer type or expression T.
+   Padding bits are not supported; this is checked at compile-time below.  */
+#define TYPE_WIDTH(t) (sizeof (t) * CHAR_BIT)
+
+/* The maximum and minimum values for the integer type T.  */
+#define TYPE_MINIMUM(t) ((t) ~ TYPE_MAXIMUM (t))
+#define TYPE_MAXIMUM(t)                                                 \
+  ((t) (! TYPE_SIGNED (t)                                               \
+        ? (t) -1                                                        \
+        : ((((t) 1 << (TYPE_WIDTH (t) - 2)) - 1) * 2 + 1)))
+
+/* The maximum and minimum values for the type of the expression E,
+   after integer promotion.  E should not have side effects.  */
+#define _GL_INT_MINIMUM(e)                                              \
+  (EXPR_SIGNED (e)                                                      \
+   ? ~ _GL_SIGNED_INT_MAXIMUM (e)                                       \
+   : _GL_INT_CONVERT (e, 0))
+#define _GL_INT_MAXIMUM(e)                                              \
+  (EXPR_SIGNED (e)                                                      \
+   ? _GL_SIGNED_INT_MAXIMUM (e)                                         \
+   : _GL_INT_NEGATE_CONVERT (e, 1))
+#define _GL_SIGNED_INT_MAXIMUM(e)                                       \
+  (((_GL_INT_CONVERT (e, 1) << (TYPE_WIDTH ((e) + 0) - 2)) - 1) * 2 + 1)
+
+/* Work around OpenVMS incompatibility with C99.  */
+#if !defined LLONG_MAX && defined __INT64_MAX
+# define LLONG_MAX __INT64_MAX
+# define LLONG_MIN __INT64_MIN
+#endif
+
+/* Does the __typeof__ keyword work?  This could be done by
+   'configure', but for now it's easier to do it by hand.  */
+#if (2 <= __GNUC__ \
+     || (1210 <= __IBMC__ && defined __IBM__TYPEOF__) \
+     || (0x5110 <= __SUNPRO_C && !__STDC__))
+# define _GL_HAVE___TYPEOF__ 1
+#else
+# define _GL_HAVE___TYPEOF__ 0
+#endif
+
+/* Return 1 if the integer type or expression T might be signed.  Return 0
+   if it is definitely unsigned.  This macro does not evaluate its argument,
+   and expands to an integer constant expression.  */
+#if _GL_HAVE___TYPEOF__
+# define _GL_SIGNED_TYPE_OR_EXPR(t) TYPE_SIGNED (__typeof__ (t))
+#else
+# define _GL_SIGNED_TYPE_OR_EXPR(t) 1
+#endif
+
+/* Bound on length of the string representing an unsigned integer
+   value representable in B bits.  log10 (2.0) < 146/485.  The
+   smallest value of B where this bound is not tight is 2621.  */
+#define INT_BITS_STRLEN_BOUND(b) (((b) * 146 + 484) / 485)
+
+/* Bound on length of the string representing an integer type or expression T.
+   Subtract 1 for the sign bit if T is signed, and then add 1 more for
+   a minus sign if needed.
+
+   Because _GL_SIGNED_TYPE_OR_EXPR sometimes returns 0 when its argument is
+   signed, this macro may overestimate the true bound by one byte when
+   applied to unsigned types of size 2, 4, 16, ... bytes.  */
+#define INT_STRLEN_BOUND(t)                                     \
+  (INT_BITS_STRLEN_BOUND (TYPE_WIDTH (t) - _GL_SIGNED_TYPE_OR_EXPR (t)) \
+   + _GL_SIGNED_TYPE_OR_EXPR (t))
+
+/* Bound on buffer size needed to represent an integer type or expression T,
+   including the terminating null.  */
+#define INT_BUFSIZE_BOUND(t) (INT_STRLEN_BOUND (t) + 1)
+
+
+/* Range overflow checks.
+
+   The INT_<op>_RANGE_OVERFLOW macros return 1 if the corresponding C
+   operators might not yield numerically correct answers due to
+   arithmetic overflow.  They do not rely on undefined or
+   implementation-defined behavior.  Their implementations are simple
+   and straightforward, but they are a bit harder to use than the
+   INT_<op>_OVERFLOW macros described below.
+
+   Example usage:
+
+     long int i = ...;
+     long int j = ...;
+     if (INT_MULTIPLY_RANGE_OVERFLOW (i, j, LONG_MIN, LONG_MAX))
+       printf ("multiply would overflow");
+     else
+       printf ("product is %ld", i * j);
+
+   Restrictions on *_RANGE_OVERFLOW macros:
+
+   These macros do not check for all possible numerical problems or
+   undefined or unspecified behavior: they do not check for division
+   by zero, for bad shift counts, or for shifting negative numbers.
+
+   These macros may evaluate their arguments zero or multiple times,
+   so the arguments should not have side effects.  The arithmetic
+   arguments (including the MIN and MAX arguments) must be of the same
+   integer type after the usual arithmetic conversions, and the type
+   must have minimum value MIN and maximum MAX.  Unsigned types should
+   use a zero MIN of the proper type.
+
+   These macros are tuned for constant MIN and MAX.  For commutative
+   operations such as A + B, they are also tuned for constant B.  */
+
+/* Return 1 if A + B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  */
+#define INT_ADD_RANGE_OVERFLOW(a, b, min, max)          \
+  ((b) < 0                                              \
+   ? (a) < (min) - (b)                                  \
+   : (max) - (b) < (a))
+
+/* Return 1 if A - B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  */
+#define INT_SUBTRACT_RANGE_OVERFLOW(a, b, min, max)     \
+  ((b) < 0                                              \
+   ? (max) + (b) < (a)                                  \
+   : (a) < (min) + (b))
+
+/* Return 1 if - A would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  */
+#define INT_NEGATE_RANGE_OVERFLOW(a, min, max)          \
+  ((min) < 0                                            \
+   ? (a) < - (max)                                      \
+   : 0 < (a))
+
+/* Return 1 if A * B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  Avoid && and || as they tickle
+   bugs in Sun C 5.11 2010/08/13 and other compilers; see
+   <http://lists.gnu.org/archive/html/bug-gnulib/2011-05/msg00401.html>.  */
+#define INT_MULTIPLY_RANGE_OVERFLOW(a, b, min, max)     \
+  ((b) < 0                                              \
+   ? ((a) < 0                                           \
+      ? (a) < (max) / (b)                               \
+      : (b) == -1                                       \
+      ? 0                                               \
+      : (min) / (b) < (a))                              \
+   : (b) == 0                                           \
+   ? 0                                                  \
+   : ((a) < 0                                           \
+      ? (a) < (min) / (b)                               \
+      : (max) / (b) < (a)))
+
+/* Return 1 if A / B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  Do not check for division by zero.  */
+#define INT_DIVIDE_RANGE_OVERFLOW(a, b, min, max)       \
+  ((min) < 0 && (b) == -1 && (a) < - (max))
+
+/* Return 1 if A % B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  Do not check for division by zero.
+   Mathematically, % should never overflow, but on x86-like hosts
+   INT_MIN % -1 traps, and the C standard permits this, so treat this
+   as an overflow too.  */
+#define INT_REMAINDER_RANGE_OVERFLOW(a, b, min, max)    \
+  INT_DIVIDE_RANGE_OVERFLOW (a, b, min, max)
+
+/* Return 1 if A << B would overflow in [MIN,MAX] arithmetic.
+   See above for restrictions.  Here, MIN and MAX are for A only, and B need
+   not be of the same type as the other arguments.  The C standard says that
+   behavior is undefined for shifts unless 0 <= B < wordwidth, and that when
+   A is negative then A << B has undefined behavior and A >> B has
+   implementation-defined behavior, but do not check these other
+   restrictions.  */
+#define INT_LEFT_SHIFT_RANGE_OVERFLOW(a, b, min, max)   \
+  ((a) < 0                                              \
+   ? (a) < (min) >> (b)                                 \
+   : (max) >> (b) < (a))
+
+/* True if __builtin_add_overflow (A, B, P) works when P is non-null.  */
+#define _GL_HAS_BUILTIN_OVERFLOW \
+  (5 <= __GNUC__ || __has_builtin (__builtin_add_overflow))
+
+/* True if __builtin_add_overflow_p (A, B, C) works.  */
+#define _GL_HAS_BUILTIN_OVERFLOW_P \
+  (7 <= __GNUC__ || __has_builtin (__builtin_add_overflow_p))
+
+/* The _GL*_OVERFLOW macros have the same restrictions as the
+   *_RANGE_OVERFLOW macros, except that they do not assume that operands
+   (e.g., A and B) have the same type as MIN and MAX.  Instead, they assume
+   that the result (e.g., A + B) has that type.  */
+#if _GL_HAS_BUILTIN_OVERFLOW_P
+# define _GL_ADD_OVERFLOW(a, b, min, max)                               \
+   __builtin_add_overflow_p (a, b, (__typeof__ ((a) + (b))) 0)
+# define _GL_SUBTRACT_OVERFLOW(a, b, min, max)                          \
+   __builtin_sub_overflow_p (a, b, (__typeof__ ((a) - (b))) 0)
+# define _GL_MULTIPLY_OVERFLOW(a, b, min, max)                          \
+   __builtin_mul_overflow_p (a, b, (__typeof__ ((a) * (b))) 0)
+#else
+# define _GL_ADD_OVERFLOW(a, b, min, max)                                \
+   ((min) < 0 ? INT_ADD_RANGE_OVERFLOW (a, b, min, max)                  \
+    : (a) < 0 ? (b) <= (a) + (b)                                         \
+    : (b) < 0 ? (a) <= (a) + (b)                                         \
+    : (a) + (b) < (b))
+# define _GL_SUBTRACT_OVERFLOW(a, b, min, max)                           \
+   ((min) < 0 ? INT_SUBTRACT_RANGE_OVERFLOW (a, b, min, max)             \
+    : (a) < 0 ? 1                                                        \
+    : (b) < 0 ? (a) - (b) <= (a)                                         \
+    : (a) < (b))
+# define _GL_MULTIPLY_OVERFLOW(a, b, min, max)                           \
+   (((min) == 0 && (((a) < 0 && 0 < (b)) || ((b) < 0 && 0 < (a))))       \
+    || INT_MULTIPLY_RANGE_OVERFLOW (a, b, min, max))
+#endif
+#define _GL_DIVIDE_OVERFLOW(a, b, min, max)                             \
+  ((min) < 0 ? (b) == _GL_INT_NEGATE_CONVERT (min, 1) && (a) < - (max)  \
+   : (a) < 0 ? (b) <= (a) + (b) - 1                                     \
+   : (b) < 0 && (a) + (b) <= (a))
+#define _GL_REMAINDER_OVERFLOW(a, b, min, max)                          \
+  ((min) < 0 ? (b) == _GL_INT_NEGATE_CONVERT (min, 1) && (a) < - (max)  \
+   : (a) < 0 ? (a) % (b) != ((max) - (b) + 1) % (b)                     \
+   : (b) < 0 && ! _GL_UNSIGNED_NEG_MULTIPLE (a, b, max))
+
+/* Return a nonzero value if A is a mathematical multiple of B, where
+   A is unsigned, B is negative, and MAX is the maximum value of A's
+   type.  A's type must be the same as (A % B)'s type.  Normally (A %
+   -B == 0) suffices, but things get tricky if -B would overflow.  */
+#define _GL_UNSIGNED_NEG_MULTIPLE(a, b, max)                            \
+  (((b) < -_GL_SIGNED_INT_MAXIMUM (b)                                   \
+    ? (_GL_SIGNED_INT_MAXIMUM (b) == (max)                              \
+       ? (a)                                                            \
+       : (a) % (_GL_INT_CONVERT (a, _GL_SIGNED_INT_MAXIMUM (b)) + 1))   \
+    : (a) % - (b))                                                      \
+   == 0)
+
+/* Check for integer overflow, and report low order bits of answer.
+
+   The INT_<op>_OVERFLOW macros return 1 if the corresponding C operators
+   might not yield numerically correct answers due to arithmetic overflow.
+   The INT_<op>_WRAPV macros also store the low-order bits of the answer.
+   These macros work correctly on all known practical hosts, and do not rely
+   on undefined behavior due to signed arithmetic overflow.
+
+   Example usage, assuming A and B are long int:
+
+     if (INT_MULTIPLY_OVERFLOW (a, b))
+       printf ("result would overflow\n");
+     else
+       printf ("result is %ld (no overflow)\n", a * b);
+
+   Example usage with WRAPV flavor:
+
+     long int result;
+     bool overflow = INT_MULTIPLY_WRAPV (a, b, &result);
+     printf ("result is %ld (%s)\n", result,
+             overflow ? "after overflow" : "no overflow");
+
+   Restrictions on these macros:
+
+   These macros do not check for all possible numerical problems or
+   undefined or unspecified behavior: they do not check for division
+   by zero, for bad shift counts, or for shifting negative numbers.
+
+   These macros may evaluate their arguments zero or multiple times, so the
+   arguments should not have side effects.
+
+   The WRAPV macros are not constant expressions.  They support only
+   +, binary -, and *.  The result type must be signed.
+
+   These macros are tuned for their last argument being a constant.
+
+   Return 1 if the integer expressions A * B, A - B, -A, A * B, A / B,
+   A % B, and A << B would overflow, respectively.  */
+
+#define INT_ADD_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_ADD_OVERFLOW)
+#define INT_SUBTRACT_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_SUBTRACT_OVERFLOW)
+#if _GL_HAS_BUILTIN_OVERFLOW_P
+# define INT_NEGATE_OVERFLOW(a) INT_SUBTRACT_OVERFLOW (0, a)
+#else
+# define INT_NEGATE_OVERFLOW(a) \
+   INT_NEGATE_RANGE_OVERFLOW (a, _GL_INT_MINIMUM (a), _GL_INT_MAXIMUM (a))
+#endif
+#define INT_MULTIPLY_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_MULTIPLY_OVERFLOW)
+#define INT_DIVIDE_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_DIVIDE_OVERFLOW)
+#define INT_REMAINDER_OVERFLOW(a, b) \
+  _GL_BINARY_OP_OVERFLOW (a, b, _GL_REMAINDER_OVERFLOW)
+#define INT_LEFT_SHIFT_OVERFLOW(a, b) \
+  INT_LEFT_SHIFT_RANGE_OVERFLOW (a, b, \
+                                 _GL_INT_MINIMUM (a), _GL_INT_MAXIMUM (a))
+
+/* Return 1 if the expression A <op> B would overflow,
+   where OP_RESULT_OVERFLOW (A, B, MIN, MAX) does the actual test,
+   assuming MIN and MAX are the minimum and maximum for the result type.
+   Arguments should be free of side effects.  */
+#define _GL_BINARY_OP_OVERFLOW(a, b, op_result_overflow)        \
+  op_result_overflow (a, b,                                     \
+                      _GL_INT_MINIMUM (0 * (b) + (a)),          \
+                      _GL_INT_MAXIMUM (0 * (b) + (a)))
+
+/* Store the low-order bits of A + B, A - B, A * B, respectively, into *R.
+   Return 1 if the result overflows.  See above for restrictions.  */
+#define INT_ADD_WRAPV(a, b, r) \
+  _GL_INT_OP_WRAPV (a, b, r, +, __builtin_add_overflow, INT_ADD_OVERFLOW)
+#define INT_SUBTRACT_WRAPV(a, b, r) \
+  _GL_INT_OP_WRAPV (a, b, r, -, __builtin_sub_overflow, INT_SUBTRACT_OVERFLOW)
+#define INT_MULTIPLY_WRAPV(a, b, r) \
+  _GL_INT_OP_WRAPV (a, b, r, *, __builtin_mul_overflow, INT_MULTIPLY_OVERFLOW)
+
+/* Nonzero if this compiler has GCC bug 68193 or Clang bug 25390.  See:
+   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68193
+   https://llvm.org/bugs/show_bug.cgi?id=25390
+   For now, assume all versions of GCC-like compilers generate bogus
+   warnings for _Generic.  This matters only for older compilers that
+   lack __builtin_add_overflow.  */
+#if __GNUC__
+# define _GL__GENERIC_BOGUS 1
+#else
+# define _GL__GENERIC_BOGUS 0
+#endif
+
+/* Store the low-order bits of A <op> B into *R, where OP specifies
+   the operation.  BUILTIN is the builtin operation, and OVERFLOW the
+   overflow predicate.  Return 1 if the result overflows.  See above
+   for restrictions.  */
+#if _GL_HAS_BUILTIN_OVERFLOW
+# define _GL_INT_OP_WRAPV(a, b, r, op, builtin, overflow) builtin (a, b, r)
+#elif 201112 <= __STDC_VERSION__ && !_GL__GENERIC_BOGUS
+# define _GL_INT_OP_WRAPV(a, b, r, op, builtin, overflow) \
+   (_Generic \
+    (*(r), \
+     signed char: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned char, \
+                        signed char, SCHAR_MIN, SCHAR_MAX), \
+     short int: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned short int, \
+                        short int, SHRT_MIN, SHRT_MAX), \
+     int: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \
+                        int, INT_MIN, INT_MAX), \
+     long int: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long int, \
+                        long int, LONG_MIN, LONG_MAX), \
+     long long int: \
+       _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long long int, \
+                        long long int, LLONG_MIN, LLONG_MAX)))
+#else
+# define _GL_INT_OP_WRAPV(a, b, r, op, builtin, overflow) \
+   (sizeof *(r) == sizeof (signed char) \
+    ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned char, \
+                       signed char, SCHAR_MIN, SCHAR_MAX) \
+    : sizeof *(r) == sizeof (short int) \
+    ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned short int, \
+                       short int, SHRT_MIN, SHRT_MAX) \
+    : sizeof *(r) == sizeof (int) \
+    ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned int, \
+                       int, INT_MIN, INT_MAX) \
+    : _GL_INT_OP_WRAPV_LONGISH(a, b, r, op, overflow))
+# ifdef LLONG_MAX
+#  define _GL_INT_OP_WRAPV_LONGISH(a, b, r, op, overflow) \
+    (sizeof *(r) == sizeof (long int) \
+     ? _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long int, \
+                        long int, LONG_MIN, LONG_MAX) \
+     : _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long long int, \
+                        long long int, LLONG_MIN, LLONG_MAX))
+# else
+#  define _GL_INT_OP_WRAPV_LONGISH(a, b, r, op, overflow) \
+    _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long int, \
+                     long int, LONG_MIN, LONG_MAX)
+# endif
+#endif
+
+/* Store the low-order bits of A <op> B into *R, where the operation
+   is given by OP.  Use the unsigned type UT for calculation to avoid
+   overflow problems.  *R's type is T, with extremal values TMIN and
+   TMAX.  T must be a signed integer type.  Return 1 if the result
+   overflows.  */
+#define _GL_INT_OP_CALC(a, b, r, op, overflow, ut, t, tmin, tmax) \
+  (sizeof ((a) op (b)) < sizeof (t) \
+   ? _GL_INT_OP_CALC1 ((t) (a), (t) (b), r, op, overflow, ut, t, tmin, tmax) \
+   : _GL_INT_OP_CALC1 (a, b, r, op, overflow, ut, t, tmin, tmax))
+#define _GL_INT_OP_CALC1(a, b, r, op, overflow, ut, t, tmin, tmax) \
+  ((overflow (a, b) \
+    || (EXPR_SIGNED ((a) op (b)) && ((a) op (b)) < (tmin)) \
+    || (tmax) < ((a) op (b))) \
+   ? (*(r) = _GL_INT_OP_WRAPV_VIA_UNSIGNED (a, b, op, ut, t, tmin, tmax), 1) \
+   : (*(r) = _GL_INT_OP_WRAPV_VIA_UNSIGNED (a, b, op, ut, t, tmin, tmax), 0))
+
+/* Return A <op> B, where the operation is given by OP.  Use the
+   unsigned type UT for calculation to avoid overflow problems.
+   Convert the result to type T without overflow by subtracting TMIN
+   from large values before converting, and adding it afterwards.
+   Compilers can optimize all the operations except OP.  */
+#define _GL_INT_OP_WRAPV_VIA_UNSIGNED(a, b, op, ut, t, tmin, tmax) \
+  (((ut) (a) op (ut) (b)) <= (tmax) \
+   ? (t) ((ut) (a) op (ut) (b)) \
+   : ((t) (((ut) (a) op (ut) (b)) - (tmin)) + (tmin)))
+
+#endif /* _GL_INTPROPS_H */
diff --git a/compat/regex/verify.h b/compat/regex/verify.h
new file mode 100644
index 0000000000..e865af5298
--- /dev/null
+++ b/compat/regex/verify.h
@@ -0,0 +1,286 @@
+/*
+ * This is git.git's copy of gawk.git's regex engine. Please see that
+ * project for the latest version & to submit patches to this code,
+ * and git.git's compat/regex/README for information on how git's copy
+ * of this code is maintained.
+ */
+
+/* Compile-time assert-like macros.
+
+   Copyright (C) 2005-2006, 2009-2016 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* Written by Paul Eggert, Bruno Haible, and Jim Meyering.  */
+
+#ifndef _GL_VERIFY_H
+#define _GL_VERIFY_H
+
+
+/* Define _GL_HAVE__STATIC_ASSERT to 1 if _Static_assert works as per C11.
+   This is supported by GCC 4.6.0 and later, in C mode, and its use
+   here generates easier-to-read diagnostics when verify (R) fails.
+
+   Define _GL_HAVE_STATIC_ASSERT to 1 if static_assert works as per C++11.
+   This will likely be supported by future GCC versions, in C++ mode.
+
+   Use this only with GCC.  If we were willing to slow 'configure'
+   down we could also use it with other compilers, but since this
+   affects only the quality of diagnostics, why bother?  */
+#if (4 < __GNUC__ + (6 <= __GNUC_MINOR__) \
+     && (201112L <= __STDC_VERSION__  || !defined __STRICT_ANSI__) \
+     && !defined __cplusplus)
+# define _GL_HAVE__STATIC_ASSERT 1
+#endif
+/* The condition (99 < __GNUC__) is temporary, until we know about the
+   first G++ release that supports static_assert.  */
+#if (99 < __GNUC__) && defined __cplusplus
+# define _GL_HAVE_STATIC_ASSERT 1
+#endif
+
+/* FreeBSD 9.1 <sys/cdefs.h>, included by <stddef.h> and lots of other
+   system headers, defines a conflicting _Static_assert that is no
+   better than ours; override it.  */
+#ifndef _GL_HAVE_STATIC_ASSERT
+# include <stddef.h>
+# undef _Static_assert
+#endif
+
+/* Each of these macros verifies that its argument R is nonzero.  To
+   be portable, R should be an integer constant expression.  Unlike
+   assert (R), there is no run-time overhead.
+
+   If _Static_assert works, verify (R) uses it directly.  Similarly,
+   _GL_VERIFY_TRUE works by packaging a _Static_assert inside a struct
+   that is an operand of sizeof.
+
+   The code below uses several ideas for C++ compilers, and for C
+   compilers that do not support _Static_assert:
+
+   * The first step is ((R) ? 1 : -1).  Given an expression R, of
+     integral or boolean or floating-point type, this yields an
+     expression of integral type, whose value is later verified to be
+     constant and nonnegative.
+
+   * Next this expression W is wrapped in a type
+     struct _gl_verify_type {
+       unsigned int _gl_verify_error_if_negative: W;
+     }.
+     If W is negative, this yields a compile-time error.  No compiler can
+     deal with a bit-field of negative size.
+
+     One might think that an array size check would have the same
+     effect, that is, that the type struct { unsigned int dummy[W]; }
+     would work as well.  However, inside a function, some compilers
+     (such as C++ compilers and GNU C) allow local parameters and
+     variables inside array size expressions.  With these compilers,
+     an array size check would not properly diagnose this misuse of
+     the verify macro:
+
+       void function (int n) { verify (n < 0); }
+
+   * For the verify macro, the struct _gl_verify_type will need to
+     somehow be embedded into a declaration.  To be portable, this
+     declaration must declare an object, a constant, a function, or a
+     typedef name.  If the declared entity uses the type directly,
+     such as in
+
+       struct dummy {...};
+       typedef struct {...} dummy;
+       extern struct {...} *dummy;
+       extern void dummy (struct {...} *);
+       extern struct {...} *dummy (void);
+
+     two uses of the verify macro would yield colliding declarations
+     if the entity names are not disambiguated.  A workaround is to
+     attach the current line number to the entity name:
+
+       #define _GL_CONCAT0(x, y) x##y
+       #define _GL_CONCAT(x, y) _GL_CONCAT0 (x, y)
+       extern struct {...} * _GL_CONCAT (dummy, __LINE__);
+
+     But this has the problem that two invocations of verify from
+     within the same macro would collide, since the __LINE__ value
+     would be the same for both invocations.  (The GCC __COUNTER__
+     macro solves this problem, but is not portable.)
+
+     A solution is to use the sizeof operator.  It yields a number,
+     getting rid of the identity of the type.  Declarations like
+
+       extern int dummy [sizeof (struct {...})];
+       extern void dummy (int [sizeof (struct {...})]);
+       extern int (*dummy (void)) [sizeof (struct {...})];
+
+     can be repeated.
+
+   * Should the implementation use a named struct or an unnamed struct?
+     Which of the following alternatives can be used?
+
+       extern int dummy [sizeof (struct {...})];
+       extern int dummy [sizeof (struct _gl_verify_type {...})];
+       extern void dummy (int [sizeof (struct {...})]);
+       extern void dummy (int [sizeof (struct _gl_verify_type {...})]);
+       extern int (*dummy (void)) [sizeof (struct {...})];
+       extern int (*dummy (void)) [sizeof (struct _gl_verify_type {...})];
+
+     In the second and sixth case, the struct type is exported to the
+     outer scope; two such declarations therefore collide.  GCC warns
+     about the first, third, and fourth cases.  So the only remaining
+     possibility is the fifth case:
+
+       extern int (*dummy (void)) [sizeof (struct {...})];
+
+   * GCC warns about duplicate declarations of the dummy function if
+     -Wredundant-decls is used.  GCC 4.3 and later have a builtin
+     __COUNTER__ macro that can let us generate unique identifiers for
+     each dummy function, to suppress this warning.
+
+   * This implementation exploits the fact that older versions of GCC,
+     which do not support _Static_assert, also do not warn about the
+     last declaration mentioned above.
+
+   * GCC warns if -Wnested-externs is enabled and verify() is used
+     within a function body; but inside a function, you can always
+     arrange to use verify_expr() instead.
+
+   * In C++, any struct definition inside sizeof is invalid.
+     Use a template type to work around the problem.  */
+
+/* Concatenate two preprocessor tokens.  */
+#define _GL_CONCAT(x, y) _GL_CONCAT0 (x, y)
+#define _GL_CONCAT0(x, y) x##y
+
+/* _GL_COUNTER is an integer, preferably one that changes each time we
+   use it.  Use __COUNTER__ if it works, falling back on __LINE__
+   otherwise.  __LINE__ isn't perfect, but it's better than a
+   constant.  */
+#if defined __COUNTER__ && __COUNTER__ != __COUNTER__
+# define _GL_COUNTER __COUNTER__
+#else
+# define _GL_COUNTER __LINE__
+#endif
+
+/* Generate a symbol with the given prefix, making it unique if
+   possible.  */
+#define _GL_GENSYM(prefix) _GL_CONCAT (prefix, _GL_COUNTER)
+
+/* Verify requirement R at compile-time, as an integer constant expression
+   that returns 1.  If R is false, fail at compile-time, preferably
+   with a diagnostic that includes the string-literal DIAGNOSTIC.  */
+
+#define _GL_VERIFY_TRUE(R, DIAGNOSTIC) \
+   (!!sizeof (_GL_VERIFY_TYPE (R, DIAGNOSTIC)))
+
+#ifdef __cplusplus
+# if !GNULIB_defined_struct__gl_verify_type
+template <int w>
+  struct _gl_verify_type {
+    unsigned int _gl_verify_error_if_negative: w;
+  };
+#  define GNULIB_defined_struct__gl_verify_type 1
+# endif
+# define _GL_VERIFY_TYPE(R, DIAGNOSTIC) \
+    _gl_verify_type<(R) ? 1 : -1>
+#elif defined _GL_HAVE__STATIC_ASSERT
+# define _GL_VERIFY_TYPE(R, DIAGNOSTIC) \
+    struct {                                   \
+      _Static_assert (R, DIAGNOSTIC);          \
+      int _gl_dummy;                          \
+    }
+#else
+# define _GL_VERIFY_TYPE(R, DIAGNOSTIC) \
+    struct { unsigned int _gl_verify_error_if_negative: (R) ? 1 : -1; }
+#endif
+
+/* Verify requirement R at compile-time, as a declaration without a
+   trailing ';'.  If R is false, fail at compile-time, preferably
+   with a diagnostic that includes the string-literal DIAGNOSTIC.
+
+   Unfortunately, unlike C11, this implementation must appear as an
+   ordinary declaration, and cannot appear inside struct { ... }.  */
+
+#ifdef _GL_HAVE__STATIC_ASSERT
+# define _GL_VERIFY _Static_assert
+#else
+# define _GL_VERIFY(R, DIAGNOSTIC)				       \
+    extern int (*_GL_GENSYM (_gl_verify_function) (void))	       \
+      [_GL_VERIFY_TRUE (R, DIAGNOSTIC)]
+#endif
+
+/* _GL_STATIC_ASSERT_H is defined if this code is copied into assert.h.  */
+#ifdef _GL_STATIC_ASSERT_H
+# if !defined _GL_HAVE__STATIC_ASSERT && !defined _Static_assert
+#  define _Static_assert(R, DIAGNOSTIC) _GL_VERIFY (R, DIAGNOSTIC)
+# endif
+# if !defined _GL_HAVE_STATIC_ASSERT && !defined static_assert
+#  define static_assert _Static_assert /* C11 requires this #define.  */
+# endif
+#endif
+
+/* @assert.h omit start@  */
+
+/* Each of these macros verifies that its argument R is nonzero.  To
+   be portable, R should be an integer constant expression.  Unlike
+   assert (R), there is no run-time overhead.
+
+   There are two macros, since no single macro can be used in all
+   contexts in C.  verify_true (R) is for scalar contexts, including
+   integer constant expression contexts.  verify (R) is for declaration
+   contexts, e.g., the top level.  */
+
+/* Verify requirement R at compile-time, as an integer constant expression.
+   Return 1.  This is equivalent to verify_expr (R, 1).
+
+   verify_true is obsolescent; please use verify_expr instead.  */
+
+#define verify_true(R) _GL_VERIFY_TRUE (R, "verify_true (" #R ")")
+
+/* Verify requirement R at compile-time.  Return the value of the
+   expression E.  */
+
+#define verify_expr(R, E) \
+   (_GL_VERIFY_TRUE (R, "verify_expr (" #R ", " #E ")") ? (E) : (E))
+
+/* Verify requirement R at compile-time, as a declaration without a
+   trailing ';'.  */
+
+#define verify(R) _GL_VERIFY (R, "verify (" #R ")")
+
+#ifndef __has_builtin
+# define __has_builtin(x) 0
+#endif
+
+/* Assume that R always holds.  This lets the compiler optimize
+   accordingly.  R should not have side-effects; it may or may not be
+   evaluated.  Behavior is undefined if R is false.  */
+
+#if (__has_builtin (__builtin_unreachable) \
+     || 4 < __GNUC__ + (5 <= __GNUC_MINOR__))
+# define assume(R) ((R) ? (void) 0 : __builtin_unreachable ())
+#elif 1200 <= _MSC_VER
+# define assume(R) __assume (R)
+#elif ((defined GCC_LINT || defined lint) \
+       && (__has_builtin (__builtin_trap) \
+           || 3 < __GNUC__ + (3 < __GNUC_MINOR__ + (4 <= __GNUC_PATCHLEVEL__))))
+  /* Doing it this way helps various packages when configured with
+     --enable-gcc-warnings, which compiles with -Dlint.  It's nicer
+     when 'assume' silences warnings even with older GCCs.  */
+# define assume(R) ((R) ? (void) 0 : __builtin_trap ())
+#else
+# define assume(R) ((void) (0 && (R)))
+#endif
+
+/* @assert.h omit end@  */
+
+#endif
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/7] fixup! compat/regex: update the gawk regex engine from upstream
  2017-05-04 22:00 ` [PATCH 3/7] fixup! " Ævar Arnfjörð Bjarmason
@ 2017-05-05  5:54   ` Johannes Sixt
  2017-05-05  6:12     ` [PATCH v2 8/7] " Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Sixt @ 2017-05-05  5:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins

Am 05.05.2017 um 00:00 schrieb Ævar Arnfjörð Bjarmason:
> ---
>  compat/regex/regcomp.c | 356 +++++++++++++++++++++++++++++--------------------
>  1 file changed, 209 insertions(+), 147 deletions(-)
>
> diff --git a/compat/regex/regcomp.c b/compat/regex/regcomp.c
> index d8bde06f1a..a1fb2e400e 100644
> --- a/compat/regex/regcomp.c
> +++ b/compat/regex/regcomp.c
> @@ -1,5 +1,12 @@
> +/*
> + * This is git.git's copy of gawk.git's regex engine. Please see that
> + * project for the latest version & to submit patches to this code,
> + * and git.git's compat/regex/README for information on how git's copy
> + * of this code is maintained.
> + */
> +
>  /* Extended regular expression matching and search library.
> -   Copyright (C) 2002-2007,2009,2010 Free Software Foundation, Inc.
> +   Copyright (C) 2002-2016 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
>     Contributed by Isamu Hasegawa <isamu@yamato.ibm.com>.
>

GNU C Library 2016? Is this GPL v3 code? that would be incompatible with 
GPL v2, I think.

> @@ -14,9 +21,20 @@
>     Lesser General Public License for more details.
>
>     You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, write to the Free
> -   Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> -   02110-1301 USA.  */
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> ...

-- Hannes


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 8/7] fixup! compat/regex: update the gawk regex engine from upstream
  2017-05-05  5:54   ` Johannes Sixt
@ 2017-05-05  6:12     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-05  6:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins,
	Ævar Arnfjörð Bjarmason

---

On Fri, May 5, 2017 at 7:54 AM, Johannes Sixt <j6t@kdbg.org> wrote:
> GNU C Library 2016? Is this GPL v3 code? that would be incompatible with GPL
> v2, I think.

No, it's all under LGPL 2.1, the glibc license has not changed since
2010.

As noted in patch 02 one amendmend to one of the LGPL 2.1 files from
glibc requires a GPLv3 header, this is a change made by Gawk in
importing it from glibc, but I removed that dependency.

But I forgot to subsequently git rm the relevant header file & amend
the monkeypatch. Here's a !fixup on top for that.

Junio: Just sending this one patch of v2, if you'd like to fetch the
whole thing it's import-2017-upstream-gawk-regex-engine-2 on
github.com/avar/git

 .../0001-Add-notice-at-top-of-copied-files.patch   |  15 --
 compat/regex/verify.h                              | 286 ---------------------
 2 files changed, 301 deletions(-)
 delete mode 100644 compat/regex/verify.h

diff --git a/compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch b/compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch
index 4b4acc45ba..4f82185174 100644
--- a/compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch
+++ b/compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch
@@ -103,18 +103,3 @@ index c8f11e52e7..c79ff38b1c 100644
  /* Extended regular expression matching and search library.
     Copyright (C) 2002-2016 Free Software Foundation, Inc.
     This file is part of the GNU C Library.
-diff --git a/compat/regex/verify.h b/compat/regex/verify.h
-index 5c8381d290..e865af5298 100644
---- a/compat/regex/verify.h
-+++ b/compat/regex/verify.h
-@@ -1,3 +1,10 @@
-+/*
-+ * This is git.git's copy of gawk.git's regex engine. Please see that
-+ * project for the latest version & to submit patches to this code,
-+ * and git.git's compat/regex/README for information on how git's copy
-+ * of this code is maintained.
-+ */
-+
- /* Compile-time assert-like macros.
- 
-    Copyright (C) 2005-2006, 2009-2016 Free Software Foundation, Inc.
diff --git a/compat/regex/verify.h b/compat/regex/verify.h
deleted file mode 100644
index e865af5298..0000000000
--- a/compat/regex/verify.h
+++ /dev/null
@@ -1,286 +0,0 @@
-/*
- * This is git.git's copy of gawk.git's regex engine. Please see that
- * project for the latest version & to submit patches to this code,
- * and git.git's compat/regex/README for information on how git's copy
- * of this code is maintained.
- */
-
-/* Compile-time assert-like macros.
-
-   Copyright (C) 2005-2006, 2009-2016 Free Software Foundation, Inc.
-
-   This program is free software: you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3 of the License, or
-   (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
-
-/* Written by Paul Eggert, Bruno Haible, and Jim Meyering.  */
-
-#ifndef _GL_VERIFY_H
-#define _GL_VERIFY_H
-
-
-/* Define _GL_HAVE__STATIC_ASSERT to 1 if _Static_assert works as per C11.
-   This is supported by GCC 4.6.0 and later, in C mode, and its use
-   here generates easier-to-read diagnostics when verify (R) fails.
-
-   Define _GL_HAVE_STATIC_ASSERT to 1 if static_assert works as per C++11.
-   This will likely be supported by future GCC versions, in C++ mode.
-
-   Use this only with GCC.  If we were willing to slow 'configure'
-   down we could also use it with other compilers, but since this
-   affects only the quality of diagnostics, why bother?  */
-#if (4 < __GNUC__ + (6 <= __GNUC_MINOR__) \
-     && (201112L <= __STDC_VERSION__  || !defined __STRICT_ANSI__) \
-     && !defined __cplusplus)
-# define _GL_HAVE__STATIC_ASSERT 1
-#endif
-/* The condition (99 < __GNUC__) is temporary, until we know about the
-   first G++ release that supports static_assert.  */
-#if (99 < __GNUC__) && defined __cplusplus
-# define _GL_HAVE_STATIC_ASSERT 1
-#endif
-
-/* FreeBSD 9.1 <sys/cdefs.h>, included by <stddef.h> and lots of other
-   system headers, defines a conflicting _Static_assert that is no
-   better than ours; override it.  */
-#ifndef _GL_HAVE_STATIC_ASSERT
-# include <stddef.h>
-# undef _Static_assert
-#endif
-
-/* Each of these macros verifies that its argument R is nonzero.  To
-   be portable, R should be an integer constant expression.  Unlike
-   assert (R), there is no run-time overhead.
-
-   If _Static_assert works, verify (R) uses it directly.  Similarly,
-   _GL_VERIFY_TRUE works by packaging a _Static_assert inside a struct
-   that is an operand of sizeof.
-
-   The code below uses several ideas for C++ compilers, and for C
-   compilers that do not support _Static_assert:
-
-   * The first step is ((R) ? 1 : -1).  Given an expression R, of
-     integral or boolean or floating-point type, this yields an
-     expression of integral type, whose value is later verified to be
-     constant and nonnegative.
-
-   * Next this expression W is wrapped in a type
-     struct _gl_verify_type {
-       unsigned int _gl_verify_error_if_negative: W;
-     }.
-     If W is negative, this yields a compile-time error.  No compiler can
-     deal with a bit-field of negative size.
-
-     One might think that an array size check would have the same
-     effect, that is, that the type struct { unsigned int dummy[W]; }
-     would work as well.  However, inside a function, some compilers
-     (such as C++ compilers and GNU C) allow local parameters and
-     variables inside array size expressions.  With these compilers,
-     an array size check would not properly diagnose this misuse of
-     the verify macro:
-
-       void function (int n) { verify (n < 0); }
-
-   * For the verify macro, the struct _gl_verify_type will need to
-     somehow be embedded into a declaration.  To be portable, this
-     declaration must declare an object, a constant, a function, or a
-     typedef name.  If the declared entity uses the type directly,
-     such as in
-
-       struct dummy {...};
-       typedef struct {...} dummy;
-       extern struct {...} *dummy;
-       extern void dummy (struct {...} *);
-       extern struct {...} *dummy (void);
-
-     two uses of the verify macro would yield colliding declarations
-     if the entity names are not disambiguated.  A workaround is to
-     attach the current line number to the entity name:
-
-       #define _GL_CONCAT0(x, y) x##y
-       #define _GL_CONCAT(x, y) _GL_CONCAT0 (x, y)
-       extern struct {...} * _GL_CONCAT (dummy, __LINE__);
-
-     But this has the problem that two invocations of verify from
-     within the same macro would collide, since the __LINE__ value
-     would be the same for both invocations.  (The GCC __COUNTER__
-     macro solves this problem, but is not portable.)
-
-     A solution is to use the sizeof operator.  It yields a number,
-     getting rid of the identity of the type.  Declarations like
-
-       extern int dummy [sizeof (struct {...})];
-       extern void dummy (int [sizeof (struct {...})]);
-       extern int (*dummy (void)) [sizeof (struct {...})];
-
-     can be repeated.
-
-   * Should the implementation use a named struct or an unnamed struct?
-     Which of the following alternatives can be used?
-
-       extern int dummy [sizeof (struct {...})];
-       extern int dummy [sizeof (struct _gl_verify_type {...})];
-       extern void dummy (int [sizeof (struct {...})]);
-       extern void dummy (int [sizeof (struct _gl_verify_type {...})]);
-       extern int (*dummy (void)) [sizeof (struct {...})];
-       extern int (*dummy (void)) [sizeof (struct _gl_verify_type {...})];
-
-     In the second and sixth case, the struct type is exported to the
-     outer scope; two such declarations therefore collide.  GCC warns
-     about the first, third, and fourth cases.  So the only remaining
-     possibility is the fifth case:
-
-       extern int (*dummy (void)) [sizeof (struct {...})];
-
-   * GCC warns about duplicate declarations of the dummy function if
-     -Wredundant-decls is used.  GCC 4.3 and later have a builtin
-     __COUNTER__ macro that can let us generate unique identifiers for
-     each dummy function, to suppress this warning.
-
-   * This implementation exploits the fact that older versions of GCC,
-     which do not support _Static_assert, also do not warn about the
-     last declaration mentioned above.
-
-   * GCC warns if -Wnested-externs is enabled and verify() is used
-     within a function body; but inside a function, you can always
-     arrange to use verify_expr() instead.
-
-   * In C++, any struct definition inside sizeof is invalid.
-     Use a template type to work around the problem.  */
-
-/* Concatenate two preprocessor tokens.  */
-#define _GL_CONCAT(x, y) _GL_CONCAT0 (x, y)
-#define _GL_CONCAT0(x, y) x##y
-
-/* _GL_COUNTER is an integer, preferably one that changes each time we
-   use it.  Use __COUNTER__ if it works, falling back on __LINE__
-   otherwise.  __LINE__ isn't perfect, but it's better than a
-   constant.  */
-#if defined __COUNTER__ && __COUNTER__ != __COUNTER__
-# define _GL_COUNTER __COUNTER__
-#else
-# define _GL_COUNTER __LINE__
-#endif
-
-/* Generate a symbol with the given prefix, making it unique if
-   possible.  */
-#define _GL_GENSYM(prefix) _GL_CONCAT (prefix, _GL_COUNTER)
-
-/* Verify requirement R at compile-time, as an integer constant expression
-   that returns 1.  If R is false, fail at compile-time, preferably
-   with a diagnostic that includes the string-literal DIAGNOSTIC.  */
-
-#define _GL_VERIFY_TRUE(R, DIAGNOSTIC) \
-   (!!sizeof (_GL_VERIFY_TYPE (R, DIAGNOSTIC)))
-
-#ifdef __cplusplus
-# if !GNULIB_defined_struct__gl_verify_type
-template <int w>
-  struct _gl_verify_type {
-    unsigned int _gl_verify_error_if_negative: w;
-  };
-#  define GNULIB_defined_struct__gl_verify_type 1
-# endif
-# define _GL_VERIFY_TYPE(R, DIAGNOSTIC) \
-    _gl_verify_type<(R) ? 1 : -1>
-#elif defined _GL_HAVE__STATIC_ASSERT
-# define _GL_VERIFY_TYPE(R, DIAGNOSTIC) \
-    struct {                                   \
-      _Static_assert (R, DIAGNOSTIC);          \
-      int _gl_dummy;                          \
-    }
-#else
-# define _GL_VERIFY_TYPE(R, DIAGNOSTIC) \
-    struct { unsigned int _gl_verify_error_if_negative: (R) ? 1 : -1; }
-#endif
-
-/* Verify requirement R at compile-time, as a declaration without a
-   trailing ';'.  If R is false, fail at compile-time, preferably
-   with a diagnostic that includes the string-literal DIAGNOSTIC.
-
-   Unfortunately, unlike C11, this implementation must appear as an
-   ordinary declaration, and cannot appear inside struct { ... }.  */
-
-#ifdef _GL_HAVE__STATIC_ASSERT
-# define _GL_VERIFY _Static_assert
-#else
-# define _GL_VERIFY(R, DIAGNOSTIC)				       \
-    extern int (*_GL_GENSYM (_gl_verify_function) (void))	       \
-      [_GL_VERIFY_TRUE (R, DIAGNOSTIC)]
-#endif
-
-/* _GL_STATIC_ASSERT_H is defined if this code is copied into assert.h.  */
-#ifdef _GL_STATIC_ASSERT_H
-# if !defined _GL_HAVE__STATIC_ASSERT && !defined _Static_assert
-#  define _Static_assert(R, DIAGNOSTIC) _GL_VERIFY (R, DIAGNOSTIC)
-# endif
-# if !defined _GL_HAVE_STATIC_ASSERT && !defined static_assert
-#  define static_assert _Static_assert /* C11 requires this #define.  */
-# endif
-#endif
-
-/* @assert.h omit start@  */
-
-/* Each of these macros verifies that its argument R is nonzero.  To
-   be portable, R should be an integer constant expression.  Unlike
-   assert (R), there is no run-time overhead.
-
-   There are two macros, since no single macro can be used in all
-   contexts in C.  verify_true (R) is for scalar contexts, including
-   integer constant expression contexts.  verify (R) is for declaration
-   contexts, e.g., the top level.  */
-
-/* Verify requirement R at compile-time, as an integer constant expression.
-   Return 1.  This is equivalent to verify_expr (R, 1).
-
-   verify_true is obsolescent; please use verify_expr instead.  */
-
-#define verify_true(R) _GL_VERIFY_TRUE (R, "verify_true (" #R ")")
-
-/* Verify requirement R at compile-time.  Return the value of the
-   expression E.  */
-
-#define verify_expr(R, E) \
-   (_GL_VERIFY_TRUE (R, "verify_expr (" #R ", " #E ")") ? (E) : (E))
-
-/* Verify requirement R at compile-time, as a declaration without a
-   trailing ';'.  */
-
-#define verify(R) _GL_VERIFY (R, "verify (" #R ")")
-
-#ifndef __has_builtin
-# define __has_builtin(x) 0
-#endif
-
-/* Assume that R always holds.  This lets the compiler optimize
-   accordingly.  R should not have side-effects; it may or may not be
-   evaluated.  Behavior is undefined if R is false.  */
-
-#if (__has_builtin (__builtin_unreachable) \
-     || 4 < __GNUC__ + (5 <= __GNUC_MINOR__))
-# define assume(R) ((R) ? (void) 0 : __builtin_unreachable ())
-#elif 1200 <= _MSC_VER
-# define assume(R) __assume (R)
-#elif ((defined GCC_LINT || defined lint) \
-       && (__has_builtin (__builtin_trap) \
-           || 3 < __GNUC__ + (3 < __GNUC_MINOR__ + (4 <= __GNUC_PATCHLEVEL__))))
-  /* Doing it this way helps various packages when configured with
-     --enable-gcc-warnings, which compiles with -Dlint.  It's nicer
-     when 'assume' silences warnings even with older GCCs.  */
-# define assume(R) ((R) ? (void) 0 : __builtin_trap ())
-#else
-# define assume(R) ((void) (0 && (R)))
-#endif
-
-/* @assert.h omit end@  */
-
-#endif
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] Update the compat/regex engine from upstream
  2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
                   ` (6 preceding siblings ...)
  2017-05-04 22:00 ` [PATCH 7/7] " Ævar Arnfjörð Bjarmason
@ 2017-05-08  0:55 ` Junio C Hamano
  2017-05-08  6:38   ` Ævar Arnfjörð Bjarmason
  2017-05-12  0:31 ` Junio C Hamano
  8 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2017-05-08  0:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> See the first patch for motivation & why.

I do not think it quite explains the motivation, though.  "Doing
this way, we can keep in sync with the upstream more easily"?  Or is
there anything more to it, e.g. "and we need to update from the
upstream now to fix this and that issues"?

Having these "fixup!" as separate patches on the list makes them
smaller and easier to understand.  What do we want to do with them
once they are applied?  Squash them all up, because these do not
have their own explanations in their log message and it is not worth
keeping them separate?

> The only reason this has a cover letter is to explain the !fixup
> commits. IIRC the mailing list has a 100K limit, which this series
> would violate, so I split up the second commit.
>
> Consider all these !fixup commits to have by Signed-off-by, easier to
> say that here than to modify them all.
>
> Ævar Arnfjörð Bjarmason (7):
>   compat/regex: add a README with a maintenance guide
>   compat/regex: update the gawk regex engine from upstream
>   fixup! compat/regex: update the gawk regex engine from upstream
>   fixup! compat/regex: update the gawk regex engine from upstream
>   fixup! compat/regex: update the gawk regex engine from upstream
>   fixup! compat/regex: update the gawk regex engine from upstream
>   fixup! compat/regex: update the gawk regex engine from upstream
>
>  Makefile                                           |   8 +-
>  compat/regex/README                                |  21 +
>  compat/regex/intprops.h                            | 448 +++++++++++++++++++++
>  .../0001-Add-notice-at-top-of-copied-files.patch   | 120 ++++++
>  .../0002-Remove-verify.h-use-from-intprops.h.patch |  41 ++
>  compat/regex/regcomp.c                             | 356 +++++++++-------
>  compat/regex/regex.c                               |  32 +-
>  compat/regex/regex.h                               | 120 +++---
>  compat/regex/regex_internal.c                      | 118 +++---
>  compat/regex/regex_internal.h                      | 118 +++---
>  compat/regex/regexec.c                             | 242 +++++------
>  compat/regex/verify.h                              | 286 +++++++++++++
>  12 files changed, 1487 insertions(+), 423 deletions(-)
>  create mode 100644 compat/regex/README
>  create mode 100644 compat/regex/intprops.h
>  create mode 100644 compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch
>  create mode 100644 compat/regex/patches/0002-Remove-verify.h-use-from-intprops.h.patch
>  create mode 100644 compat/regex/verify.h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] Update the compat/regex engine from upstream
  2017-05-08  0:55 ` [PATCH 0/7] Update the compat/regex " Junio C Hamano
@ 2017-05-08  6:38   ` Ævar Arnfjörð Bjarmason
  2017-05-08  7:03     ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-08  6:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins

On Mon, May 8, 2017 at 2:55 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> See the first patch for motivation & why.
>
> I do not think it quite explains the motivation, though.  "Doing
> this way, we can keep in sync with the upstream more easily"?  Or is
> there anything more to it, e.g. "and we need to update from the
> upstream now to fix this and that issues"?

It's just to bring us in line with upstream which as noted in that
commit has fixed the issues we locally patched (and more), and for us
to stop maintaining a fork of an old version of part of Gawk.

There's no known issue with the existing engine that I'm aware of
which impacts git, but given ~7 years of bugfixes & improvements
there's surely something.

> Having these "fixup!" as separate patches on the list makes them
> smaller and easier to understand.  What do we want to do with them
> once they are applied?  Squash them all up, because these do not
> have their own explanations in their log message and it is not worth
> keeping them separate?

Please squash them all up into this commit ("compat/regex: update the
gawk regex engine from upstream"), as noted having them as separate
patches was a hack to get around mailing list limits.

Also, un-squashed they'd break the NO_REGEX=Y build for a few commits,
which would be a pain during bisecting.

>> The only reason this has a cover letter is to explain the !fixup
>> commits. IIRC the mailing list has a 100K limit, which this series
>> would violate, so I split up the second commit.
>>
>> Consider all these !fixup commits to have by Signed-off-by, easier to
>> say that here than to modify them all.
>>
>> Ævar Arnfjörð Bjarmason (7):
>>   compat/regex: add a README with a maintenance guide
>>   compat/regex: update the gawk regex engine from upstream
>>   fixup! compat/regex: update the gawk regex engine from upstream
>>   fixup! compat/regex: update the gawk regex engine from upstream
>>   fixup! compat/regex: update the gawk regex engine from upstream
>>   fixup! compat/regex: update the gawk regex engine from upstream
>>   fixup! compat/regex: update the gawk regex engine from upstream
>>
>>  Makefile                                           |   8 +-
>>  compat/regex/README                                |  21 +
>>  compat/regex/intprops.h                            | 448 +++++++++++++++++++++
>>  .../0001-Add-notice-at-top-of-copied-files.patch   | 120 ++++++
>>  .../0002-Remove-verify.h-use-from-intprops.h.patch |  41 ++
>>  compat/regex/regcomp.c                             | 356 +++++++++-------
>>  compat/regex/regex.c                               |  32 +-
>>  compat/regex/regex.h                               | 120 +++---
>>  compat/regex/regex_internal.c                      | 118 +++---
>>  compat/regex/regex_internal.h                      | 118 +++---
>>  compat/regex/regexec.c                             | 242 +++++------
>>  compat/regex/verify.h                              | 286 +++++++++++++
>>  12 files changed, 1487 insertions(+), 423 deletions(-)
>>  create mode 100644 compat/regex/README
>>  create mode 100644 compat/regex/intprops.h
>>  create mode 100644 compat/regex/patches/0001-Add-notice-at-top-of-copied-files.patch
>>  create mode 100644 compat/regex/patches/0002-Remove-verify.h-use-from-intprops.h.patch
>>  create mode 100644 compat/regex/verify.h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] Update the compat/regex engine from upstream
  2017-05-08  6:38   ` Ævar Arnfjörð Bjarmason
@ 2017-05-08  7:03     ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2017-05-08  7:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> It's just to bring us in line with upstream which as noted in that
> commit has fixed the issues we locally patched (and more), and for us
> to stop maintaining a fork of an old version of part of Gawk.
>
> There's no known issue with the existing engine that I'm aware of
> which impacts git, but given ~7 years of bugfixes & improvements
> there's surely something.
>
>> Having these "fixup!" as separate patches on the list makes them
>> smaller and easier to understand.  What do we want to do with them
>> once they are applied?  Squash them all up, because these do not
>> have their own explanations in their log message and it is not worth
>> keeping them separate?
>
> Please squash them all up into this commit ("compat/regex: update the
> gawk regex engine from upstream"), as noted having them as separate
> patches was a hack to get around mailing list limits.
>
> Also, un-squashed they'd break the NO_REGEX=Y build for a few commits,
> which would be a pain during bisecting.

OK.  Thanks for clarification.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/7] Update the compat/regex engine from upstream
  2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
                   ` (7 preceding siblings ...)
  2017-05-08  0:55 ` [PATCH 0/7] Update the compat/regex " Junio C Hamano
@ 2017-05-12  0:31 ` Junio C Hamano
  8 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2017-05-12  0:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, git, Johannes Sixt, Ramsay Jones,
	Stefano Lattarini, Ondřej Bílka, Arnold D . Robbins

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> See the first patch for motivation & why.
> ...

I do not necessarily agree with the upgrading strategy outlined in
1/7, but that is a separate issue.  There may be some other bits
that needs resurrecting out of "git log -p master compat/regex/",
but I cannot test them myself (the part changed by the following
patch have no effect in an environment where long is intptr_t, so
"make NO_REGEX=YesPlease" build does not fail for me), so I'm
letting Dscho's Windows builder find it out via Travis.

-- >8 --
Date: Fri, 12 May 2017 09:00:07 +0900
Subject: [PATCH] compat/regex: make it compile with -Werror=int-to-pointer-cast

The change by 56a1a3ab ("Silence GCC's "cast of pointer to integer
of a different size" warning", 2015-10-26) may need resurrecting; I
do not think an unprotected #include <stdint.h> is the best way to
do this, but for the purpose of places that needs further work,
thishsould do.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 compat/regex/regcomp.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/compat/regex/regcomp.c b/compat/regex/regcomp.c
index 5e9ea26cd4..5688a639bf 100644
--- a/compat/regex/regcomp.c
+++ b/compat/regex/regcomp.c
@@ -24,9 +24,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#ifdef HAVE_STDINT_H
 #include <stdint.h>
-#endif
 
 #ifdef HAVE_STRINGS_H
 #include <strings.h>
@@ -2641,7 +2639,7 @@ parse_dup_op (bin_tree_t *elem, re_string_t *regexp, re_dfa_t *dfa,
     old_tree = NULL;
 
   if (elem->token.type == SUBEXP)
-    postorder (elem, mark_opt_subexp, (void *) (long) elem->token.opr.idx);
+    postorder (elem, mark_opt_subexp, (void *) (intptr_t) elem->token.opr.idx);
 
   tree = create_tree (dfa, elem, NULL, (end == -1 ? OP_DUP_ASTERISK : OP_ALT));
   if (BE (tree == NULL, 0))
@@ -3868,7 +3866,7 @@ create_token_tree (re_dfa_t *dfa, bin_tree_t *left, bin_tree_t *right,
 static reg_errcode_t
 mark_opt_subexp (void *extra, bin_tree_t *node)
 {
-  int idx = (int) (long) extra;
+  int idx = (int) (intptr_t) extra;
   if (node->token.type == SUBEXP && node->token.opr.idx == idx)
     node->token.opt_subexp = 1;
 
-- 
2.13.0-334-gbb1c091dbc


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] compat/regex: add a README with a maintenance guide
  2017-05-04 22:00 ` [PATCH 1/7] compat/regex: add a README with a maintenance guide Ævar Arnfjörð Bjarmason
@ 2017-05-12  0:47   ` Junio C Hamano
  2017-05-12 10:15     ` Johannes Schindelin
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2017-05-12  0:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Johannes Sixt, Ramsay Jones, Stefano Lattarini,
	Ondřej Bílka, Arnold D . Robbins

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/compat/regex/README b/compat/regex/README
> new file mode 100644
> index 0000000000..345d322d8c
> --- /dev/null
> +++ b/compat/regex/README
> @@ -0,0 +1,21 @@
> +This is the Git project's copy of the GNU awk (Gawk) regex
> +engine. It's used when Git is build with e.g. NO_REGEX=NeedsStartEnd,
> +or when the C library's regular expression functions are otherwise
> +deficient.
> +
> +This is not a fork, but a source code copy. Upstream is the Gawk
> +project, and the sources should be periodically updated from their
> +copy, which can be done with:
> +
> +    for f in $(find . -name '*.[ch]' -printf "%f\n"); do wget http://git.savannah.gnu.org/cgit/gawk.git/plain/support/$f -O $f; done
> +
> +For ease of maintenance, and to intentionally make it inconvenient to
> +diverge from upstream (since it makes it harder to re-merge) any local
> +changes should be stored in the patches/ directory, which after doing
> +the above can be applied as:
> +
> +    for p in patches/*; do patch -p3 < $p; done
> +
> +For any changes that aren't specific to the git.git copy please submit
> +a patch to the Gawk project and/or to the GNU C library (the Gawk
> +regex engine is a periodically & forked copy from glibc.git).

I am not a huge fan of placing patch files under version control.

If I were doing the "code drop from the outside world from time to
time", I'd rather do the following every time we update:

 - have a topic branch for importing version N+1, and in its first
   commit, replace compat/regex/ with the pristine copy of the files
   we'll borrow from version N+1.

 - ask "git log -p compat/regex/" to grab all changes made to the
   directory, and stop at the commit that imported the pristine copy
   of the files we borrowed from version N.  These are the changes
   we made to the pristine copy of version N to adjust it to our
   needs.

 - cherry-pick these patches on the topic branch; some of them
   hopefully have been upstreamed, the remainder of the patches are
   presumably to adjust the code to our local needs.

 - make more changes, while still on the topic branch, to adjust the
   code to our local and current needs.

 - once the result becomes buildable and tests OK, merge it back to
   the mainline.

This may break bisectability, but I think it is OK (you should be
able to skip and test only first-parent chain, treating as if these
are squashed together into a single change).  The patch files your
approach is keeping will become the individual patches on the topic
branch, and will be explained and justified the same way as any
other patches in their commit log message.

Having said all that, since I am not expecting to be the primary one
working in this area, I'll let you (who I take to be volunteering to
be the one) pick the approach that you would find the easiest and
least error prone to handle this task.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] compat/regex: add a README with a maintenance guide
  2017-05-12  0:47   ` Junio C Hamano
@ 2017-05-12 10:15     ` Johannes Schindelin
  2017-05-12 20:59       ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Schindelin @ 2017-05-12 10:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Sixt,
	Ramsay Jones, Stefano Lattarini, Ondřej Bílka,
	Arnold D . Robbins

[-- Attachment #1: Type: text/plain, Size: 3720 bytes --]

Hi,

[replacing Ramsay's email address with a working one]

On Fri, 12 May 2017, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > diff --git a/compat/regex/README b/compat/regex/README
> > new file mode 100644
> > index 0000000000..345d322d8c
> > --- /dev/null
> > +++ b/compat/regex/README
> > @@ -0,0 +1,21 @@
> > +This is the Git project's copy of the GNU awk (Gawk) regex
> > +engine. It's used when Git is build with e.g. NO_REGEX=NeedsStartEnd,
> > +or when the C library's regular expression functions are otherwise
> > +deficient.
> > +
> > +This is not a fork, but a source code copy. Upstream is the Gawk
> > +project, and the sources should be periodically updated from their
> > +copy, which can be done with:
> > +
> > +    for f in $(find . -name '*.[ch]' -printf "%f\n"); do wget http://git.savannah.gnu.org/cgit/gawk.git/plain/support/$f -O $f; done
> > +
> > +For ease of maintenance, and to intentionally make it inconvenient to
> > +diverge from upstream (since it makes it harder to re-merge) any local
> > +changes should be stored in the patches/ directory, which after doing
> > +the above can be applied as:
> > +
> > +    for p in patches/*; do patch -p3 < $p; done
> > +
> > +For any changes that aren't specific to the git.git copy please submit
> > +a patch to the Gawk project and/or to the GNU C library (the Gawk
> > +regex engine is a periodically & forked copy from glibc.git).
> 
> I am not a huge fan of placing patch files under version control.
> 
> If I were doing the "code drop from the outside world from time to
> time", I'd rather do the following every time we update:
> 
>  - have a topic branch for importing version N+1, and in its first
>    commit, replace compat/regex/ with the pristine copy of the files
>    we'll borrow from version N+1.
> 
>  - ask "git log -p compat/regex/" to grab all changes made to the
>    directory, and stop at the commit that imported the pristine copy
>    of the files we borrowed from version N.  These are the changes
>    we made to the pristine copy of version N to adjust it to our
>    needs.
> 
>  - cherry-pick these patches on the topic branch; some of them
>    hopefully have been upstreamed, the remainder of the patches are
>    presumably to adjust the code to our local needs.
> 
>  - make more changes, while still on the topic branch, to adjust the
>    code to our local and current needs.
> 
>  - once the result becomes buildable and tests OK, merge it back to
>    the mainline.
> 
> This may break bisectability, but I think it is OK (you should be
> able to skip and test only first-parent chain, treating as if these
> are squashed together into a single change).  The patch files your
> approach is keeping will become the individual patches on the topic
> branch, and will be explained and justified the same way as any
> other patches in their commit log message.
> 
> Having said all that, since I am not expecting to be the primary one
> working in this area, I'll let you (who I take to be volunteering to
> be the one) pick the approach that you would find the easiest and
> least error prone to handle this task.

FWIW I agree that Junio's proposed strategy would make more sense, with
one addition of my own:

- rather than scraping the files from the CGit website (which does not
  guarantee that the first scraped file will be from the same revision as
  the last scraped file), I would very strongly prefer the files to be
  copied from a clone of gawk.git, and the gawk.git revision from which
  they were copied should be recorded in git.git's commit adding them.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] compat/regex: add a README with a maintenance guide
  2017-05-12 10:15     ` Johannes Schindelin
@ 2017-05-12 20:59       ` Junio C Hamano
  2017-05-14 19:14         ` arnold
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2017-05-12 20:59 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Sixt,
	Ramsay Jones, Stefano Lattarini, Ondřej Bílka,
	Arnold D . Robbins

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> - rather than scraping the files from the CGit website (which does not
>   guarantee that the first scraped file will be from the same revision as
>   the last scraped file), I would very strongly prefer the files to be
>   copied from a clone of gawk.git, and the gawk.git revision from which
>   they were copied should be recorded in git.git's commit adding them.

Wow, I didn't even notice that was how the "original" came about.
No question that we should be taking from a known-stable snapshot,
not from a moving target.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] compat/regex: add a README with a maintenance guide
  2017-05-12 20:59       ` Junio C Hamano
@ 2017-05-14 19:14         ` arnold
  2017-05-15  1:20           ` Junio C Hamano
  2017-05-15 12:14           ` Johannes Schindelin
  0 siblings, 2 replies; 21+ messages in thread
From: arnold @ 2017-05-14 19:14 UTC (permalink / raw)
  To: Johannes.Schindelin, gitster
  Cc: stefano.lattarini, ramsay, neleai, j6t, git, avarab, arnold

Junio C Hamano <gitster@pobox.com> wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > - rather than scraping the files from the CGit website (which does not
> >   guarantee that the first scraped file will be from the same revision as
> >   the last scraped file), I would very strongly prefer the files to be
> >   copied from a clone of gawk.git, and the gawk.git revision from which
> >   they were copied should be recorded in git.git's commit adding them.
>
> Wow, I didn't even notice that was how the "original" came about.
> No question that we should be taking from a known-stable snapshot,
> not from a moving target.

Gawk's regex has been fairly stable of late.  But marking the revision
from which you take the regex is a good idea in any case.

With respect to bug fixes that may have happened downstream, please do
let me know of any.  But I do request it as a bug report to bug-gawk@gnu.org
and not just a pull request with no commentary.

Thanks,

Arnold

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] compat/regex: add a README with a maintenance guide
  2017-05-14 19:14         ` arnold
@ 2017-05-15  1:20           ` Junio C Hamano
  2017-05-15 12:14           ` Johannes Schindelin
  1 sibling, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2017-05-15  1:20 UTC (permalink / raw)
  To: arnold
  Cc: Johannes.Schindelin, stefano.lattarini, ramsay, neleai, j6t, git,
	avarab

arnold@skeeve.com writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > - rather than scraping the files from the CGit website (which does not
>> >   guarantee that the first scraped file will be from the same revision as
>> >   the last scraped file), I would very strongly prefer the files to be
>> >   copied from a clone of gawk.git, and the gawk.git revision from which
>> >   they were copied should be recorded in git.git's commit adding them.
>>
>> Wow, I didn't even notice that was how the "original" came about.
>> No question that we should be taking from a known-stable snapshot,
>> not from a moving target.
>
> Gawk's regex has been fairly stable of late.  But marking the revision
> from which you take the regex is a good idea in any case.

I do not mind taking a snapshot from an untagged commit (instead of
sticking to the last tagged commit, being suspicious about newer
developments).  What I was reacting to was a loop like this:

    for f in $(find . -name '*.[ch]' -printf "%f\n")
    do wget http://git.savannah.gnu.org/cgit/gawk.git/plain/support/$f -O $f
    done

i.e. allowing wget to grab paths out of possibly different commits.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] compat/regex: add a README with a maintenance guide
  2017-05-14 19:14         ` arnold
  2017-05-15  1:20           ` Junio C Hamano
@ 2017-05-15 12:14           ` Johannes Schindelin
  2017-05-15 12:51             ` arnold
  1 sibling, 1 reply; 21+ messages in thread
From: Johannes Schindelin @ 2017-05-15 12:14 UTC (permalink / raw)
  To: arnold; +Cc: gitster, stefano.lattarini, ramsay, neleai, j6t, git, avarab

[-- Attachment #1: Type: text/plain, Size: 2441 bytes --]

Hi Arnold,

On Sun, 14 May 2017, arnold@skeeve.com wrote:

> With respect to bug fixes that may have happened downstream, please do
> let me know of any.  But I do request it as a bug report to
> bug-gawk@gnu.org and not just a pull request with no commentary.

I dabbled with updating our compat/regex/ myself, a while ago, and just
found my notes. Note: at least some of these notes should help with the
next iteration of Ævar's patch series.

First of all, our original import could have been accompanied by better
documentation what was done. Granted, back then gawk was still maintained
in CVS, so things would have been a little tougher with regard to, say,
specifying which gawk revision was imported. In the meantime, gawk uses a
Git repository, though: http://git.savannah.gnu.org/r/gawk.git. Therefore,
we can say pretty precisely that gawk's 40b3741f (Bring in development
gawk changes., 2010-11-12)) was imported into Git as per d18f76dccf
(compat/regex: use the regex engine from gawk for compat, 2010-08-17).

My approach of updating compat/regex/ differed from Ævar's in that I
checked out that Git commit, applied the interdiff to gawk's newest
commit, and rebased that onto the current commit of Git. But I think Ævar
& Junio's approach (replace compat/regex/ wholesale by the newest gawk
revision's files, then re-apply clean patches of our `git log 40b3741f..
-- compat/regex/` on top, as individual commits) is saner, as it will make
future updates substantially easier.

With my approach, I still had 16 merge conflicts, pointing in large part
to changes we do *not* want to contribute back: gawk's code style differs
from ours, and we adjusted the files in compat/regex/ to ours (which I
think was a mistake).

I also reinstated support for compiling with NO_MBSUPPORT, which included
a new guard of the btowc() definition.

I also had to reintroduce explicit #defines of bool, true and false, as
gawk's source code split those out into their own header file.

I apparently also "skipped a guarded #include <stddef.h> that was not
actually necessary, but simply a late fixup to a997bf423d (compat/regex:
get the gawk regex engine to compile within git, 2010-08-17)", but I do
not remember what that was about.

In summary, I do not think that any of our patches should go "upstream"
into gawk's source code, as they are pretty specific to Git's needs.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] compat/regex: add a README with a maintenance guide
  2017-05-15 12:14           ` Johannes Schindelin
@ 2017-05-15 12:51             ` arnold
  0 siblings, 0 replies; 21+ messages in thread
From: arnold @ 2017-05-15 12:51 UTC (permalink / raw)
  To: Johannes.Schindelin, arnold
  Cc: stefano.lattarini, ramsay, neleai, j6t, gitster, git, avarab

Hi Johannes,

Thanks for the info!  I appreciate the background.

In the future if you folks do find a bug, please let me know.

Thanks!

Arnold

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:

> Hi Arnold,
>
> On Sun, 14 May 2017, arnold@skeeve.com wrote:
>
> > With respect to bug fixes that may have happened downstream, please do
> > let me know of any.  But I do request it as a bug report to
> > bug-gawk@gnu.org and not just a pull request with no commentary.
>
> I dabbled with updating our compat/regex/ myself, a while ago, and just
> found my notes. Note: at least some of these notes should help with the
> next iteration of Ævar's patch series.
>
> First of all, our original import could have been accompanied by better
> documentation what was done. Granted, back then gawk was still maintained
> in CVS, so things would have been a little tougher with regard to, say,
> specifying which gawk revision was imported. In the meantime, gawk uses a
> Git repository, though: http://git.savannah.gnu.org/r/gawk.git. Therefore,
> we can say pretty precisely that gawk's 40b3741f (Bring in development
> gawk changes., 2010-11-12)) was imported into Git as per d18f76dccf
> (compat/regex: use the regex engine from gawk for compat, 2010-08-17).
>
> My approach of updating compat/regex/ differed from Ævar's in that I
> checked out that Git commit, applied the interdiff to gawk's newest
> commit, and rebased that onto the current commit of Git. But I think Ævar
> & Junio's approach (replace compat/regex/ wholesale by the newest gawk
> revision's files, then re-apply clean patches of our `git log 40b3741f..
> -- compat/regex/` on top, as individual commits) is saner, as it will make
> future updates substantially easier.
>
> With my approach, I still had 16 merge conflicts, pointing in large part
> to changes we do *not* want to contribute back: gawk's code style differs
> from ours, and we adjusted the files in compat/regex/ to ours (which I
> think was a mistake).
>
> I also reinstated support for compiling with NO_MBSUPPORT, which included
> a new guard of the btowc() definition.
>
> I also had to reintroduce explicit #defines of bool, true and false, as
> gawk's source code split those out into their own header file.
>
> I apparently also "skipped a guarded #include <stddef.h> that was not
> actually necessary, but simply a late fixup to a997bf423d (compat/regex:
> get the gawk regex engine to compile within git, 2010-08-17)", but I do
> not remember what that was about.
>
> In summary, I do not think that any of our patches should go "upstream"
> into gawk's source code, as they are pretty specific to Git's needs.
>
> Ciao,
> Johannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-05-15 12:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04 22:00 [PATCH 0/7] Update the compat/regex engine from upstream Ævar Arnfjörð Bjarmason
2017-05-04 22:00 ` [PATCH 1/7] compat/regex: add a README with a maintenance guide Ævar Arnfjörð Bjarmason
2017-05-12  0:47   ` Junio C Hamano
2017-05-12 10:15     ` Johannes Schindelin
2017-05-12 20:59       ` Junio C Hamano
2017-05-14 19:14         ` arnold
2017-05-15  1:20           ` Junio C Hamano
2017-05-15 12:14           ` Johannes Schindelin
2017-05-15 12:51             ` arnold
2017-05-04 22:00 ` [PATCH 2/7] compat/regex: update the gawk regex engine from upstream Ævar Arnfjörð Bjarmason
2017-05-04 22:00 ` [PATCH 3/7] fixup! " Ævar Arnfjörð Bjarmason
2017-05-05  5:54   ` Johannes Sixt
2017-05-05  6:12     ` [PATCH v2 8/7] " Ævar Arnfjörð Bjarmason
2017-05-04 22:00 ` [PATCH 4/7] " Ævar Arnfjörð Bjarmason
2017-05-04 22:00 ` [PATCH 5/7] " Ævar Arnfjörð Bjarmason
2017-05-04 22:00 ` [PATCH 6/7] " Ævar Arnfjörð Bjarmason
2017-05-04 22:00 ` [PATCH 7/7] " Ævar Arnfjörð Bjarmason
2017-05-08  0:55 ` [PATCH 0/7] Update the compat/regex " Junio C Hamano
2017-05-08  6:38   ` Ævar Arnfjörð Bjarmason
2017-05-08  7:03     ` Junio C Hamano
2017-05-12  0:31 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).