git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] Fix a problem with PCRE2 and nedmalloc, found via Azure Pipelines
@ 2019-08-05 11:51 Johannes Schindelin via GitGitGadget
  2019-08-05 11:51 ` [PATCH 1/1] pcre2: allow overriding the system allocator Johannes Schindelin via GitGitGadget
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-08-05 11:51 UTC (permalink / raw)
  To: git; +Cc: Carlo Marcelo Arenas Belón, Junio C Hamano

For a couple of days, maybe even a week, pu fails consistently, in the
Windows job where it tests t7816. The symptom is a segmentation fault.

I finally got to diagnose this, and it looked at first as if there was yet
another buffer overrun that was so small that valgrind failed to detect it
(see https://github.com/gitgitgadget/git/pull/178). The problem is another
one, though: we ask PCRE2 to allocate a table (and it uses the system
allocator for that), and then try to release it using nedmalloc's free() 
replacement.

This is squarely a problem with cb/pcre2-chartables-leakfix in conjunction
with an overridden allocator.

Junio, for your convenience, I rebased this patch directly on top of 
ab/pcre-v2 and pushed out the let-pcre2-respect-nedmalloc branch at 
https://github.com/dscho/git ready to be pulled. The rebased version is not
technically a bug fix, as I do not see any way that ab/pcre-v2 uses
mismatched allocators for malloc()/free(), but just in case that you wanted
to have it in v2.23.0 and not on top cb/pcre2-chartables-leakfix...

Johannes Schindelin (1):
  pcre2: allow overriding the system allocator

 grep.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)


base-commit: 7d3bf76999452ff64c84cec25fd95a7a95744b78
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-306%2Fdscho%2Ffix-for-pcre-chartables-leakfix-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-306/dscho/fix-for-pcre-chartables-leakfix-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/306
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread
* [PATCH 0/3] Revive 'pcre2-chartables-leakfix'
@ 2019-10-16 12:06 Johannes Schindelin via GitGitGadget
  2019-10-16 12:06 ` [PATCH 2/3] grep: make PCRE2 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-10-16 12:06 UTC (permalink / raw)
  To: git; +Cc: Carlo Marcelo Arenas Belón, Johannes Schindelin,
	Junio C Hamano

As I had stated several times, I was really unhappy how the original fix
harped on nedmalloc and totally ignored runtime-configured custom
allocators.

So this is, at long last, my attempt to give this a new life. It is based
off of maint and needs trivial merge conflict resolutions relative to master
.

Range-diff relative to cb/pcre2-chartables-leakfix:

1:  770584d697e ! 1:  4feb8cc83a6 grep: make PCRE1 aware of custom allocator
    @@ Commit message
         to override the system alocator, and so it is incompatible with
         USE_NED_ALLOCATOR as reported by Dscho[1] (in similar code from PCRE2)

    +    Note that nedmalloc, as well as other custom allocators like jemalloc
    +    and mi-malloc, can be configured at runtime (via `LD_PRELOAD`),
    +    therefore we cannot know at compile time whether a custom allocator is
    +    used or not.
    +
         Make the minimum change possible to ensure this combination is supported
    -    by extending grep_init to set the PCRE1 specific functions to the NED
    -    versions (using the aliased names though) and therefore making sure all
    -    alocations are done inside PCRE1 with the same allocator than the rest
    -    of git.
    +    by extending `grep_init()` to set the PCRE1 specific functions to Git's
    +    idea of `malloc()` and `free()` and therefore making sure all
    +    allocations are done inside PCRE1 with the same allocator than the rest
    +    of Git.

    -    This change might have performance impact (hopefully for the best) and
    -    so will be good idea to test it in a platform where NED might have a
    -    positive impact (ex: Windows 7)
    +    This change has negligible performance impact: PCRE needs to allocate
    +    memory once per program run for the character table and for each pattern
    +    compilation. These are both rare events compared to matching patterns
    +    against lines. Actual measurements[2] show that the impact is lost in
    +    the noise.

         [1] https://public-inbox.org/git/pull.306.git.gitgitgadget@gmail.com
    +    [2] https://public-inbox.org/git/7f42007f-911b-c570-17f6-1c6af0429586@web.de

         Signed-off-by: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -
    - ## Makefile ##
    -@@ Makefile: ifdef NATIVE_CRLF
    - endif
    - 
    - ifdef USE_NED_ALLOCATOR
    --    COMPAT_CFLAGS += -Icompat/nedmalloc
    -+    COMPAT_CFLAGS += -DUSE_NED_ALLOCATOR -Icompat/nedmalloc
    -     COMPAT_OBJS += compat/nedmalloc/nedmalloc.o
    -     OVERRIDE_STRDUP = YesPlease
    - endif
    +    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

      ## grep.c ##
     @@ grep.c: int grep_config(const char *var, const char *value, void *cb)
    @@ grep.c: int grep_config(const char *var, const char *value, void *cb)
       * default values from the template we read the configuration
       * information in an earlier call to git_config(grep_config).
     + *
    -+ * If using PCRE make sure that the library is configured
    -+ * to use the right allocator (ex: NED)
    ++ * If using PCRE, make sure that the library is configured
    ++ * to use the same allocator as Git (e.g. nedmalloc on Windows).
       */
      void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix)
      {
          struct grep_opt *def = &grep_defaults;
          int i;

    -+#ifdef USE_NED_ALLOCATOR
     +#ifdef USE_LIBPCRE1
     +    pcre_malloc = malloc;
     +    pcre_free = free;
     +#endif
    -+#endif
     +
          memset(opt, 0, sizeof(*opt));
          opt->repo = repo;
2:  cf8d36f34a2 ! 2:  39ad220c18b grep: make PCRE2 aware of custom allocator
    @@
      ## Metadata ##
    -Author: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com>
    +Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>

      ## Commit message ##
         grep: make PCRE2 aware of custom allocator

         94da9193a6 (grep: add support for PCRE v2, 2017-06-01) didn't include
         a way to override the system allocator, and so it is incompatible with
    -    USE_NED_ALLOCATOR.  The problem was made visible when an attempt to
    -    avoid a leak in a data structure that is created by the library was
    -    passed to NED's free for disposal triggering a segfault in Windows.
    +    custom allocators (e.g. nedmalloc). This problem became obvious when we
    +    tried to plug a memory leak by `free()`ing a data structure allocated by
    +    PCRE2, triggering a segfault in Windows (where we use nedmalloc by
    +    default).

         PCRE2 requires the use of a general context to override the allocator
         and therefore, there is a lot more code needed than in PCRE1, including
    @@ Commit message
         Extend the grep API with a "destructor" that could be called to cleanup
         any objects that were created and used globally.

    -    Update builtin/grep to use that new API, but any other future
    -    users should make sure to have matching grep_init/grep_destroy calls
    -    if they are using the pattern matching functionality (currently only
    -    relevant when using both NED and PCRE2)
    +    Update `builtin/grep.c` to use that new API, but any other future users
    +    should make sure to have matching `grep_init()`/`grep_destroy()` calls
    +    if they are using the pattern matching functionality.

         Move some of the logic that was before done per thread (in the workers)
    -    into an earlier phase to avoid degrading performance, but as the use
    -    of PCRE2 with NED is better understood it is expected more of its
    -    functions will be instructed to use the custom allocator as well as
    +    into an earlier phase to avoid degrading performance, but as the use of
    +    PCRE2 with custom allocators is better understood it is expected more of
    +    its functions will be instructed to use the custom allocator as well as
         was done in the original code[1] this work was based on.

         [1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/

         Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
         Signed-off-by: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    +    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

      ## builtin/grep.c ##
     @@ builtin/grep.c: int cmd_grep(int argc, const char **argv, const char *prefix)
    @@ grep.c: static int grep_source_is_binary(struct grep_source *gs,
     +#ifdef USE_LIBPCRE2
     +static pcre2_general_context *pcre2_global_context;
     +
    -+#ifdef USE_NED_ALLOCATOR
     +static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data)
     +{
     +    return malloc(size);
    @@ grep.c: static int grep_source_is_binary(struct grep_source *gs,
     +    return free(pointer);
     +}
     +#endif
    -+#endif
     +
      static const char *color_grep_slots[] = {
          [GREP_COLOR_CONTEXT]        = "context",
          [GREP_COLOR_FILENAME]        = "filename",
     @@ grep.c: int grep_config(const char *var, const char *value, void *cb)
       *
    -  * If using PCRE make sure that the library is configured
    -  * to use the right allocator (ex: NED)
    -+ * if any object is created it should be cleaned up in grep_destroy()
    +  * If using PCRE, make sure that the library is configured
    +  * to use the same allocator as Git (e.g. nedmalloc on Windows).
    ++ *
    ++ * Any allocated memory needs to be released in grep_destroy().
       */
      void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix)
      {
    +     struct grep_opt *def = &grep_defaults;
    +     int i;
    + 
    ++#if defined(USE_LIBPCRE2)
    ++    if (!pcre2_global_context)
    ++        pcre2_global_context = pcre2_general_context_create(
    ++                    pcre2_malloc, pcre2_free, NULL);
    ++#endif
    ++
    + #ifdef USE_LIBPCRE1
    +     pcre_malloc = malloc;
    +     pcre_free = free;
     @@ grep.c: void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix
              color_set(opt->colors[i], def->colors[i]);
      }
    @@ grep.c: void grep_init(struct grep_opt *opt, struct repository *repo, const char
      static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt)
      {
          /*
    -@@ grep.c: void append_header_grep_pattern(struct grep_opt *opt,
    - void append_grep_pattern(struct grep_opt *opt, const char *pat,
    -              const char *origin, int no, enum grep_pat_token t)
    - {
    -+#if defined(USE_LIBPCRE2) && defined(USE_NED_ALLOCATOR)
    -+    if (!pcre2_global_context && opt->ignore_case && has_non_ascii(pat))
    -+        pcre2_global_context = pcre2_general_context_create(
    -+                    pcre2_malloc, pcre2_free, NULL);
    -+#endif
    -     append_grep_pat(opt, pat, strlen(pat), origin, no, t);
    - }
    - 
     @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt

          p->pcre2_compile_context = NULL;
    @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_
          if (opt->ignore_case) {
              if (has_non_ascii(p->pattern)) {
     -            character_tables = pcre2_maketables(NULL);
    -+#ifdef USE_NED_ALLOCATOR
     +            if (!pcre2_global_context)
     +                BUG("pcre2_global_context uninitialized");
    -+#endif
     +            character_tables = pcre2_maketables(pcre2_global_context);
                  p->pcre2_compile_context = pcre2_compile_context_create(NULL);
                  pcre2_set_character_tables(p->pcre2_compile_context, character_tables);
3:  54aef142151 ! 3:  3ced7c386a0 grep: avoid leak of chartables in PCRE2
    @@
      ## Metadata ##
    -Author: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com>
    +Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>

      ## Commit message ##
         grep: avoid leak of chartables in PCRE2
    @@ Commit message
         for PCRE1.

         Signed-off-by: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    +    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

      ## grep.c ##
     @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
    @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_
          int patinforet;
          size_t jitsizearg;
     @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
    +         if (has_non_ascii(p->pattern)) {
                  if (!pcre2_global_context)
                      BUG("pcre2_global_context uninitialized");
    - #endif
     -            character_tables = pcre2_maketables(pcre2_global_context);
     +            p->pcre2_tables = pcre2_maketables(pcre2_global_context);
                  p->pcre2_compile_context = pcre2_compile_context_create(NULL);

My merge conflict resolution with master
[https://github.com/dscho/git/commit/pcre2-chartables-leakfix-merge-with-master]
:

diff --cc grep.c
index 9556d13dc1d,0bb4cbd3d82..b7ae5a442a6
--- a/grep.c
+++ b/grep.c
@@@ -533,15 -470,11 +506,15 @@@ static void compile_pcre2_pattern(struc

      p->pcre2_compile_context = NULL;

 +    /* pcre2_global_context is initialized in append_grep_pattern */
      if (opt->ignore_case) {
-         if (has_non_ascii(p->pattern)) {
+         if (!opt->ignore_locale && has_non_ascii(p->pattern)) {
 -            character_tables = pcre2_maketables(NULL);
 +            if (!pcre2_global_context)
 +                BUG("pcre2_global_context uninitialized");
 +            p->pcre2_tables = pcre2_maketables(pcre2_global_context);
              p->pcre2_compile_context = pcre2_compile_context_create(NULL);
 -            pcre2_set_character_tables(p->pcre2_compile_context, character_tables);
 +            pcre2_set_character_tables(p->pcre2_compile_context,
 +                            p->pcre2_tables);
          }
          options |= PCRE2_CASELESS;
      }
@@@ -643,9 -571,6 +611,7 @@@ static void free_pcre2_pattern(struct g
      pcre2_compile_context_free(p->pcre2_compile_context);
      pcre2_code_free(p->pcre2_pattern);
      pcre2_match_data_free(p->pcre2_match_data);
-     pcre2_jit_stack_free(p->pcre2_jit_stack);
-     pcre2_match_context_free(p->pcre2_match_context);
 +    free((void *)p->pcre2_tables);
  }
  #else /* !USE_LIBPCRE2 */
  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
diff --cc grep.h
index 533165c21a5,05dc1bb98e5..811fd274c95
--- a/grep.h
+++ b/grep.h
@@@ -94,12 -78,9 +78,10 @@@ struct grep_pat 
      pcre2_code *pcre2_pattern;
      pcre2_match_data *pcre2_match_data;
      pcre2_compile_context *pcre2_compile_context;
-     pcre2_match_context *pcre2_match_context;
-     pcre2_jit_stack *pcre2_jit_stack;
 +    const uint8_t *pcre2_tables;
      uint32_t pcre2_jit_on;
-     kwset_t kws;
      unsigned fixed:1;
+     unsigned is_fixed:1;
      unsigned ignore_case:1;
      unsigned word_regexp:1;
  };

Carlo Marcelo Arenas Belón (2):
  grep: make PCRE1 aware of custom allocator
  grep: make PCRE2 aware of custom allocator

Johannes Schindelin (1):
  grep: avoid leak of chartables in PCRE2

 builtin/grep.c |  1 +
 grep.c         | 47 ++++++++++++++++++++++++++++++++++++++++++++---
 grep.h         |  2 ++
 3 files changed, 47 insertions(+), 3 deletions(-)


base-commit: 51cf315870bbb7254ddf06c84fe03b41bc48eebd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-402%2Fdscho%2Fpcre2-chartables-leakfix-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-402/dscho/pcre2-chartables-leakfix-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/402
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2019-10-16 12:06 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-05 11:51 [PATCH 0/1] Fix a problem with PCRE2 and nedmalloc, found via Azure Pipelines Johannes Schindelin via GitGitGadget
2019-08-05 11:51 ` [PATCH 1/1] pcre2: allow overriding the system allocator Johannes Schindelin via GitGitGadget
2019-08-05 16:19   ` Carlo Arenas
2019-08-05 16:27     ` Carlo Arenas
2019-08-05 19:32       ` Johannes Schindelin
2019-08-05 19:26     ` Johannes Schindelin
2019-08-05 21:53     ` Junio C Hamano
2019-08-06  6:24       ` Carlo Arenas
2019-08-06  8:50       ` [PATCH 0/3] grep: no leaks (WIP) Carlo Marcelo Arenas Belón
2019-08-06  8:50         ` [PATCH 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón
2019-08-08 13:54           ` Johannes Schindelin
2019-08-08 15:19             ` Carlo Arenas
2019-08-06  8:50         ` [PATCH 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón
2019-08-08 13:56           ` Johannes Schindelin
2019-08-08 14:32             ` Carlo Arenas
2019-08-06  8:50         ` [PATCH 3/3] grep: avoid leak of chartables in PCRE2 Carlo Marcelo Arenas Belón
2019-08-06 16:36         ` [RFC PATCH v3 0/3] grep: no leaks or crashes (windows testing needed) Carlo Marcelo Arenas Belón
2019-08-06 16:36           ` [RFC PATCH v3 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón
2019-08-06 16:36           ` [RFC PATCH v3 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón
2019-08-07  5:38             ` René Scharfe
2019-08-07  9:49               ` Carlo Arenas
2019-08-07 13:02                 ` René Scharfe
2019-08-07 13:08                   ` [PATCH 1/2] nedmalloc: do assignments only after the declaration section René Scharfe
2019-08-07 13:09                   ` [PATCH 2/2] nedmalloc: avoid compiler warning about unused value René Scharfe
2019-08-08  2:35                   ` [RFC PATCH v3 2/3] grep: make PCRE2 aware of custom allocator Carlo Arenas
2019-08-08  7:07                     ` René Scharfe
2019-08-08 12:38                       ` Carlo Arenas
2019-08-08 14:29                         ` René Scharfe
2019-08-08 20:18                           ` Johannes Schindelin
2019-08-07 18:15                 ` Junio C Hamano
2019-08-06 16:36           ` [RFC PATCH v3 3/3] grep: avoid leak of chartables in PCRE2 Carlo Marcelo Arenas Belón
2019-08-06 16:48           ` [RFC PATCH v3 0/3] grep: no leaks or crashes (windows testing needed) Junio C Hamano
2019-08-07 21:39           ` [RFC PATCH v4 " Carlo Marcelo Arenas Belón
2019-08-07 21:39             ` [RFC PATCH v4 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón
2019-08-07 21:39             ` [RFC PATCH v4 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón
2019-08-07 22:28               ` Junio C Hamano
2019-08-07 21:39             ` [RFC PATCH v4 3/3] grep: avoid leak of chartables in PCRE2 Carlo Marcelo Arenas Belón
2019-08-09  3:02             ` [RFC PATCH v5 0/3] grep: almost no more leaks, hopefully no crashes Carlo Marcelo Arenas Belón
2019-08-09  3:02               ` [RFC PATCH v5 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón
2019-08-09  3:02               ` [RFC PATCH v5 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón
2019-08-27  9:07                 ` Johannes Schindelin
2019-08-27 11:51                   ` Carlo Arenas
2019-10-03  5:02                     ` Junio C Hamano
2019-10-03  8:08                       ` Johannes Schindelin
2019-10-03 11:17                         ` Carlo Arenas
2019-10-03 18:23                           ` Johannes Schindelin
2019-10-03 22:57                           ` Junio C Hamano
2019-08-09  3:02               ` [RFC PATCH v5 3/3] grep: avoid leak of chartables in PCRE2 Carlo Marcelo Arenas Belón
2019-08-09 11:24               ` [RFC PATCH v5 0/3] grep: almost no more leaks, hopefully no crashes Carlo Arenas
2019-08-09 17:01                 ` René Scharfe
2019-08-09 17:46                   ` Junio C Hamano
2019-08-09 21:26                   ` Johannes Schindelin
2019-08-10  3:03                     ` [PATCH] SQUASH Carlo Marcelo Arenas Belón
2019-08-10  7:57                       ` René Scharfe
2019-08-10  8:42                         ` Carlo Arenas
2019-08-10 19:47                           ` René Scharfe
2019-08-12  7:35                             ` Carlo Arenas
2019-08-12 12:14                               ` René Scharfe
2019-08-12 12:28                                 ` Carlo Arenas
2019-08-10 13:57                       ` Johannes Schindelin
2019-08-10  3:05                     ` [RFC PATCH v5 0/3] grep: almost no more leaks, hopefully no crashes Carlo Arenas
2019-08-10  7:56                       ` René Scharfe
2019-08-10 12:40                         ` Carlo Arenas
2019-08-10 21:16                           ` René Scharfe
2019-08-08 20:21           ` [RFC PATCH v3 0/3] grep: no leaks or crashes (windows testing needed) Johannes Schindelin
2019-08-09  6:52             ` Carlo Arenas
2019-08-09 21:13               ` Johannes Schindelin
  -- strict thread matches above, loose matches on Subject: below --
2019-10-16 12:06 [PATCH 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget
2019-10-16 12:06 ` [PATCH 2/3] grep: make PCRE2 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).