* [PATCH 0/3] Revive 'pcre2-chartables-leakfix' @ 2019-10-16 12:06 Johannes Schindelin via GitGitGadget 2019-10-16 12:06 ` [PATCH 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: Johannes Schindelin via GitGitGadget @ 2019-10-16 12:06 UTC (permalink / raw) To: git; +Cc: Carlo Marcelo Arenas Belón, Johannes Schindelin, Junio C Hamano As I had stated several times, I was really unhappy how the original fix harped on nedmalloc and totally ignored runtime-configured custom allocators. So this is, at long last, my attempt to give this a new life. It is based off of maint and needs trivial merge conflict resolutions relative to master . Range-diff relative to cb/pcre2-chartables-leakfix: 1: 770584d697e ! 1: 4feb8cc83a6 grep: make PCRE1 aware of custom allocator @@ Commit message to override the system alocator, and so it is incompatible with USE_NED_ALLOCATOR as reported by Dscho[1] (in similar code from PCRE2) + Note that nedmalloc, as well as other custom allocators like jemalloc + and mi-malloc, can be configured at runtime (via `LD_PRELOAD`), + therefore we cannot know at compile time whether a custom allocator is + used or not. + Make the minimum change possible to ensure this combination is supported - by extending grep_init to set the PCRE1 specific functions to the NED - versions (using the aliased names though) and therefore making sure all - alocations are done inside PCRE1 with the same allocator than the rest - of git. + by extending `grep_init()` to set the PCRE1 specific functions to Git's + idea of `malloc()` and `free()` and therefore making sure all + allocations are done inside PCRE1 with the same allocator than the rest + of Git. - This change might have performance impact (hopefully for the best) and - so will be good idea to test it in a platform where NED might have a - positive impact (ex: Windows 7) + This change has negligible performance impact: PCRE needs to allocate + memory once per program run for the character table and for each pattern + compilation. These are both rare events compared to matching patterns + against lines. Actual measurements[2] show that the impact is lost in + the noise. [1] https://public-inbox.org/git/pull.306.git.gitgitgadget@gmail.com + [2] https://public-inbox.org/git/7f42007f-911b-c570-17f6-1c6af0429586@web.de Signed-off-by: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com> - Signed-off-by: Junio C Hamano <gitster@pobox.com> - - ## Makefile ## -@@ Makefile: ifdef NATIVE_CRLF - endif - - ifdef USE_NED_ALLOCATOR -- COMPAT_CFLAGS += -Icompat/nedmalloc -+ COMPAT_CFLAGS += -DUSE_NED_ALLOCATOR -Icompat/nedmalloc - COMPAT_OBJS += compat/nedmalloc/nedmalloc.o - OVERRIDE_STRDUP = YesPlease - endif + Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> ## grep.c ## @@ grep.c: int grep_config(const char *var, const char *value, void *cb) @@ grep.c: int grep_config(const char *var, const char *value, void *cb) * default values from the template we read the configuration * information in an earlier call to git_config(grep_config). + * -+ * If using PCRE make sure that the library is configured -+ * to use the right allocator (ex: NED) ++ * If using PCRE, make sure that the library is configured ++ * to use the same allocator as Git (e.g. nedmalloc on Windows). */ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix) { struct grep_opt *def = &grep_defaults; int i; -+#ifdef USE_NED_ALLOCATOR +#ifdef USE_LIBPCRE1 + pcre_malloc = malloc; + pcre_free = free; +#endif -+#endif + memset(opt, 0, sizeof(*opt)); opt->repo = repo; 2: cf8d36f34a2 ! 2: 39ad220c18b grep: make PCRE2 aware of custom allocator @@ ## Metadata ## -Author: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com> +Author: Johannes Schindelin <Johannes.Schindelin@gmx.de> ## Commit message ## grep: make PCRE2 aware of custom allocator 94da9193a6 (grep: add support for PCRE v2, 2017-06-01) didn't include a way to override the system allocator, and so it is incompatible with - USE_NED_ALLOCATOR. The problem was made visible when an attempt to - avoid a leak in a data structure that is created by the library was - passed to NED's free for disposal triggering a segfault in Windows. + custom allocators (e.g. nedmalloc). This problem became obvious when we + tried to plug a memory leak by `free()`ing a data structure allocated by + PCRE2, triggering a segfault in Windows (where we use nedmalloc by + default). PCRE2 requires the use of a general context to override the allocator and therefore, there is a lot more code needed than in PCRE1, including @@ Commit message Extend the grep API with a "destructor" that could be called to cleanup any objects that were created and used globally. - Update builtin/grep to use that new API, but any other future - users should make sure to have matching grep_init/grep_destroy calls - if they are using the pattern matching functionality (currently only - relevant when using both NED and PCRE2) + Update `builtin/grep.c` to use that new API, but any other future users + should make sure to have matching `grep_init()`/`grep_destroy()` calls + if they are using the pattern matching functionality. Move some of the logic that was before done per thread (in the workers) - into an earlier phase to avoid degrading performance, but as the use - of PCRE2 with NED is better understood it is expected more of its - functions will be instructed to use the custom allocator as well as + into an earlier phase to avoid degrading performance, but as the use of + PCRE2 with custom allocators is better understood it is expected more of + its functions will be instructed to use the custom allocator as well as was done in the original code[1] this work was based on. [1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com> - Signed-off-by: Junio C Hamano <gitster@pobox.com> + Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> ## builtin/grep.c ## @@ builtin/grep.c: int cmd_grep(int argc, const char **argv, const char *prefix) @@ grep.c: static int grep_source_is_binary(struct grep_source *gs, +#ifdef USE_LIBPCRE2 +static pcre2_general_context *pcre2_global_context; + -+#ifdef USE_NED_ALLOCATOR +static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) +{ + return malloc(size); @@ grep.c: static int grep_source_is_binary(struct grep_source *gs, + return free(pointer); +} +#endif -+#endif + static const char *color_grep_slots[] = { [GREP_COLOR_CONTEXT] = "context", [GREP_COLOR_FILENAME] = "filename", @@ grep.c: int grep_config(const char *var, const char *value, void *cb) * - * If using PCRE make sure that the library is configured - * to use the right allocator (ex: NED) -+ * if any object is created it should be cleaned up in grep_destroy() + * If using PCRE, make sure that the library is configured + * to use the same allocator as Git (e.g. nedmalloc on Windows). ++ * ++ * Any allocated memory needs to be released in grep_destroy(). */ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix) { + struct grep_opt *def = &grep_defaults; + int i; + ++#if defined(USE_LIBPCRE2) ++ if (!pcre2_global_context) ++ pcre2_global_context = pcre2_general_context_create( ++ pcre2_malloc, pcre2_free, NULL); ++#endif ++ + #ifdef USE_LIBPCRE1 + pcre_malloc = malloc; + pcre_free = free; @@ grep.c: void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix color_set(opt->colors[i], def->colors[i]); } @@ grep.c: void grep_init(struct grep_opt *opt, struct repository *repo, const char static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt) { /* -@@ grep.c: void append_header_grep_pattern(struct grep_opt *opt, - void append_grep_pattern(struct grep_opt *opt, const char *pat, - const char *origin, int no, enum grep_pat_token t) - { -+#if defined(USE_LIBPCRE2) && defined(USE_NED_ALLOCATOR) -+ if (!pcre2_global_context && opt->ignore_case && has_non_ascii(pat)) -+ pcre2_global_context = pcre2_general_context_create( -+ pcre2_malloc, pcre2_free, NULL); -+#endif - append_grep_pat(opt, pat, strlen(pat), origin, no, t); - } - @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_compile_context = NULL; @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_ if (opt->ignore_case) { if (has_non_ascii(p->pattern)) { - character_tables = pcre2_maketables(NULL); -+#ifdef USE_NED_ALLOCATOR + if (!pcre2_global_context) + BUG("pcre2_global_context uninitialized"); -+#endif + character_tables = pcre2_maketables(pcre2_global_context); p->pcre2_compile_context = pcre2_compile_context_create(NULL); pcre2_set_character_tables(p->pcre2_compile_context, character_tables); 3: 54aef142151 ! 3: 3ced7c386a0 grep: avoid leak of chartables in PCRE2 @@ ## Metadata ## -Author: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com> +Author: Johannes Schindelin <Johannes.Schindelin@gmx.de> ## Commit message ## grep: avoid leak of chartables in PCRE2 @@ Commit message for PCRE1. Signed-off-by: Carlo Marcelo Arenas Bel├│n <carenas@gmail.com> - Signed-off-by: Junio C Hamano <gitster@pobox.com> + Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> ## grep.c ## @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_ int patinforet; size_t jitsizearg; @@ grep.c: static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt + if (has_non_ascii(p->pattern)) { if (!pcre2_global_context) BUG("pcre2_global_context uninitialized"); - #endif - character_tables = pcre2_maketables(pcre2_global_context); + p->pcre2_tables = pcre2_maketables(pcre2_global_context); p->pcre2_compile_context = pcre2_compile_context_create(NULL); My merge conflict resolution with master [https://github.com/dscho/git/commit/pcre2-chartables-leakfix-merge-with-master] : diff --cc grep.c index 9556d13dc1d,0bb4cbd3d82..b7ae5a442a6 --- a/grep.c +++ b/grep.c @@@ -533,15 -470,11 +506,15 @@@ static void compile_pcre2_pattern(struc p->pcre2_compile_context = NULL; + /* pcre2_global_context is initialized in append_grep_pattern */ if (opt->ignore_case) { - if (has_non_ascii(p->pattern)) { + if (!opt->ignore_locale && has_non_ascii(p->pattern)) { - character_tables = pcre2_maketables(NULL); + if (!pcre2_global_context) + BUG("pcre2_global_context uninitialized"); + p->pcre2_tables = pcre2_maketables(pcre2_global_context); p->pcre2_compile_context = pcre2_compile_context_create(NULL); - pcre2_set_character_tables(p->pcre2_compile_context, character_tables); + pcre2_set_character_tables(p->pcre2_compile_context, + p->pcre2_tables); } options |= PCRE2_CASELESS; } @@@ -643,9 -571,6 +611,7 @@@ static void free_pcre2_pattern(struct g pcre2_compile_context_free(p->pcre2_compile_context); pcre2_code_free(p->pcre2_pattern); pcre2_match_data_free(p->pcre2_match_data); - pcre2_jit_stack_free(p->pcre2_jit_stack); - pcre2_match_context_free(p->pcre2_match_context); + free((void *)p->pcre2_tables); } #else /* !USE_LIBPCRE2 */ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) diff --cc grep.h index 533165c21a5,05dc1bb98e5..811fd274c95 --- a/grep.h +++ b/grep.h @@@ -94,12 -78,9 +78,10 @@@ struct grep_pat pcre2_code *pcre2_pattern; pcre2_match_data *pcre2_match_data; pcre2_compile_context *pcre2_compile_context; - pcre2_match_context *pcre2_match_context; - pcre2_jit_stack *pcre2_jit_stack; + const uint8_t *pcre2_tables; uint32_t pcre2_jit_on; - kwset_t kws; unsigned fixed:1; + unsigned is_fixed:1; unsigned ignore_case:1; unsigned word_regexp:1; }; Carlo Marcelo Arenas Belón (2): grep: make PCRE1 aware of custom allocator grep: make PCRE2 aware of custom allocator Johannes Schindelin (1): grep: avoid leak of chartables in PCRE2 builtin/grep.c | 1 + grep.c | 47 ++++++++++++++++++++++++++++++++++++++++++++--- grep.h | 2 ++ 3 files changed, 47 insertions(+), 3 deletions(-) base-commit: 51cf315870bbb7254ddf06c84fe03b41bc48eebd Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-402%2Fdscho%2Fpcre2-chartables-leakfix-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-402/dscho/pcre2-chartables-leakfix-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/402 -- gitgitgadget ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 1/3] grep: make PCRE1 aware of custom allocator 2019-10-16 12:06 [PATCH 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget @ 2019-10-16 12:06 ` Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-16 12:06 ` [PATCH 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (2 subsequent siblings) 3 siblings, 0 replies; 42+ messages in thread From: Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-16 12:06 UTC (permalink / raw) To: git Cc: Carlo Marcelo Arenas Belón, Johannes Schindelin, Junio C Hamano, Carlo Marcelo Arenas Belón From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas@gmail.com> 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) didn't include a way to override the system alocator, and so it is incompatible with USE_NED_ALLOCATOR as reported by Dscho[1] (in similar code from PCRE2) Note that nedmalloc, as well as other custom allocators like jemalloc and mi-malloc, can be configured at runtime (via `LD_PRELOAD`), therefore we cannot know at compile time whether a custom allocator is used or not. Make the minimum change possible to ensure this combination is supported by extending `grep_init()` to set the PCRE1 specific functions to Git's idea of `malloc()` and `free()` and therefore making sure all allocations are done inside PCRE1 with the same allocator than the rest of Git. This change has negligible performance impact: PCRE needs to allocate memory once per program run for the character table and for each pattern compilation. These are both rare events compared to matching patterns against lines. Actual measurements[2] show that the impact is lost in the noise. [1] https://public-inbox.org/git/pull.306.git.gitgitgadget@gmail.com [2] https://public-inbox.org/git/7f42007f-911b-c570-17f6-1c6af0429586@web.de Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- grep.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/grep.c b/grep.c index cd952ef5d3..db6e0d895f 100644 --- a/grep.c +++ b/grep.c @@ -150,12 +150,20 @@ int grep_config(const char *var, const char *value, void *cb) * Initialize one instance of grep_opt and copy the * default values from the template we read the configuration * information in an earlier call to git_config(grep_config). + * + * If using PCRE, make sure that the library is configured + * to use the same allocator as Git (e.g. nedmalloc on Windows). */ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix) { struct grep_opt *def = &grep_defaults; int i; +#ifdef USE_LIBPCRE1 + pcre_malloc = malloc; + pcre_free = free; +#endif + memset(opt, 0, sizeof(*opt)); opt->repo = repo; opt->prefix = prefix; -- gitgitgadget ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 2/3] grep: make PCRE2 aware of custom allocator 2019-10-16 12:06 [PATCH 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget 2019-10-16 12:06 ` [PATCH 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-16 12:06 ` Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-16 12:06 ` [PATCH 3/3] grep: avoid leak of chartables in PCRE2 Johannes Schindelin via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget 3 siblings, 0 replies; 42+ messages in thread From: Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-16 12:06 UTC (permalink / raw) To: git Cc: Carlo Marcelo Arenas Belón, Johannes Schindelin, Junio C Hamano, Carlo Marcelo Arenas Belón From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas@gmail.com> 94da9193a6 (grep: add support for PCRE v2, 2017-06-01) didn't include a way to override the system allocator, and so it is incompatible with custom allocators (e.g. nedmalloc). This problem became obvious when we tried to plug a memory leak by `free()`ing a data structure allocated by PCRE2, triggering a segfault in Windows (where we use nedmalloc by default). PCRE2 requires the use of a general context to override the allocator and therefore, there is a lot more code needed than in PCRE1, including a couple of wrapper functions. Extend the grep API with a "destructor" that could be called to cleanup any objects that were created and used globally. Update `builtin/grep.c` to use that new API, but any other future users should make sure to have matching `grep_init()`/`grep_destroy()` calls if they are using the pattern matching functionality. Move some of the logic that was before done per thread (in the workers) into an earlier phase to avoid degrading performance, but as the use of PCRE2 with custom allocators is better understood it is expected more of its functions will be instructed to use the custom allocator as well as was done in the original code[1] this work was based on. [1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- builtin/grep.c | 1 + grep.c | 34 +++++++++++++++++++++++++++++++++- grep.h | 1 + 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/builtin/grep.c b/builtin/grep.c index 560051784e..e49c20df60 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1145,5 +1145,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) run_pager(&opt, prefix); clear_pathspec(&pathspec); free_grep_patterns(&opt); + grep_destroy(); return !hit; } diff --git a/grep.c b/grep.c index db6e0d895f..296edbb56f 100644 --- a/grep.c +++ b/grep.c @@ -16,6 +16,20 @@ static int grep_source_is_binary(struct grep_source *gs, static struct grep_opt grep_defaults; +#ifdef USE_LIBPCRE2 +static pcre2_general_context *pcre2_global_context; + +static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) +{ + return malloc(size); +} + +static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) +{ + return free(pointer); +} +#endif + static const char *color_grep_slots[] = { [GREP_COLOR_CONTEXT] = "context", [GREP_COLOR_FILENAME] = "filename", @@ -153,12 +167,20 @@ int grep_config(const char *var, const char *value, void *cb) * * If using PCRE, make sure that the library is configured * to use the same allocator as Git (e.g. nedmalloc on Windows). + * + * Any allocated memory needs to be released in grep_destroy(). */ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix) { struct grep_opt *def = &grep_defaults; int i; +#if defined(USE_LIBPCRE2) + if (!pcre2_global_context) + pcre2_global_context = pcre2_general_context_create( + pcre2_malloc, pcre2_free, NULL); +#endif + #ifdef USE_LIBPCRE1 pcre_malloc = malloc; pcre_free = free; @@ -186,6 +208,13 @@ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix color_set(opt->colors[i], def->colors[i]); } +void grep_destroy(void) +{ +#ifdef USE_LIBPCRE2 + pcre2_general_context_free(pcre2_global_context); +#endif +} + static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt) { /* @@ -505,9 +534,12 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_compile_context = NULL; + /* pcre2_global_context is initialized in append_grep_pattern */ if (opt->ignore_case) { if (has_non_ascii(p->pattern)) { - character_tables = pcre2_maketables(NULL); + if (!pcre2_global_context) + BUG("pcre2_global_context uninitialized"); + character_tables = pcre2_maketables(pcre2_global_context); p->pcre2_compile_context = pcre2_compile_context_create(NULL); pcre2_set_character_tables(p->pcre2_compile_context, character_tables); } diff --git a/grep.h b/grep.h index 1875880f37..526c2db9ef 100644 --- a/grep.h +++ b/grep.h @@ -189,6 +189,7 @@ struct grep_opt { void init_grep_defaults(struct repository *); int grep_config(const char *var, const char *value, void *); void grep_init(struct grep_opt *, struct repository *repo, const char *prefix); +void grep_destroy(void); void grep_commit_pattern_type(enum grep_pattern_type, struct grep_opt *opt); void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t); -- gitgitgadget ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 3/3] grep: avoid leak of chartables in PCRE2 2019-10-16 12:06 [PATCH 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget 2019-10-16 12:06 ` [PATCH 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-16 12:06 ` [PATCH 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-16 12:06 ` Johannes Schindelin via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget 3 siblings, 0 replies; 42+ messages in thread From: Johannes Schindelin via GitGitGadget @ 2019-10-16 12:06 UTC (permalink / raw) To: git Cc: Carlo Marcelo Arenas Belón, Johannes Schindelin, Junio C Hamano, Johannes Schindelin From: Johannes Schindelin <johannes.schindelin@gmx.de> 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) introduced a small memory leak visible with valgrind in t7813. Complete the creation of a PCRE2 specific variable that was missing from the original change and free the generated table just like it is done for PCRE1. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- grep.c | 7 ++++--- grep.h | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/grep.c b/grep.c index 296edbb56f..9556d13dc1 100644 --- a/grep.c +++ b/grep.c @@ -525,7 +525,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt PCRE2_UCHAR errbuf[256]; PCRE2_SIZE erroffset; int options = PCRE2_MULTILINE; - const uint8_t *character_tables = NULL; int jitret; int patinforet; size_t jitsizearg; @@ -539,9 +538,10 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt if (has_non_ascii(p->pattern)) { if (!pcre2_global_context) BUG("pcre2_global_context uninitialized"); - character_tables = pcre2_maketables(pcre2_global_context); + p->pcre2_tables = pcre2_maketables(pcre2_global_context); p->pcre2_compile_context = pcre2_compile_context_create(NULL); - pcre2_set_character_tables(p->pcre2_compile_context, character_tables); + pcre2_set_character_tables(p->pcre2_compile_context, + p->pcre2_tables); } options |= PCRE2_CASELESS; } @@ -645,6 +645,7 @@ static void free_pcre2_pattern(struct grep_pat *p) pcre2_match_data_free(p->pcre2_match_data); pcre2_jit_stack_free(p->pcre2_jit_stack); pcre2_match_context_free(p->pcre2_match_context); + free((void *)p->pcre2_tables); } #else /* !USE_LIBPCRE2 */ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) diff --git a/grep.h b/grep.h index 526c2db9ef..533165c21a 100644 --- a/grep.h +++ b/grep.h @@ -96,6 +96,7 @@ struct grep_pat { pcre2_compile_context *pcre2_compile_context; pcre2_match_context *pcre2_match_context; pcre2_jit_stack *pcre2_jit_stack; + const uint8_t *pcre2_tables; uint32_t pcre2_jit_on; kwset_t kws; unsigned fixed:1; -- gitgitgadget ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v2 0/3] Revive 'pcre2-chartables-leakfix' 2019-10-16 12:06 [PATCH 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget ` (2 preceding siblings ...) 2019-10-16 12:06 ` [PATCH 3/3] grep: avoid leak of chartables in PCRE2 Johannes Schindelin via GitGitGadget @ 2019-10-16 12:10 ` Johannes Schindelin via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget ` (2 more replies) 3 siblings, 3 replies; 42+ messages in thread From: Johannes Schindelin via GitGitGadget @ 2019-10-16 12:10 UTC (permalink / raw) To: git; +Cc: Carlo Marcelo Arenas Belón, Johannes Schindelin, Junio C Hamano As I had stated several times, I was really unhappy how the original fix harped on nedmalloc and totally ignored runtime-configured custom allocators. So this is, at long last, my attempt to give this a new life. It is based off of maint and needs trivial merge conflict resolutions relative to master . Changes since v1: * I managed to mess up the authorship of 3/3. Sorry for that. I fixed it, so that Carlo is shown as author again. Carlo Marcelo Arenas Belón (3): grep: make PCRE1 aware of custom allocator grep: make PCRE2 aware of custom allocator grep: avoid leak of chartables in PCRE2 builtin/grep.c | 1 + grep.c | 47 ++++++++++++++++++++++++++++++++++++++++++++--- grep.h | 2 ++ 3 files changed, 47 insertions(+), 3 deletions(-) base-commit: 51cf315870bbb7254ddf06c84fe03b41bc48eebd Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-402%2Fdscho%2Fpcre2-chartables-leakfix-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-402/dscho/pcre2-chartables-leakfix-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/402 Range-diff vs v1: 1: 4feb8cc83a = 1: 4feb8cc83a grep: make PCRE1 aware of custom allocator 2: 191d3a2280 = 2: 191d3a2280 grep: make PCRE2 aware of custom allocator 3: f21b2c9eb5 ! 3: f8724fb267 grep: avoid leak of chartables in PCRE2 @@ -1,4 +1,4 @@ -Author: Johannes Schindelin <johannes.schindelin@gmx.de> +Author: Carlo Marcelo Arenas Belón <carenas@gmail.com> grep: avoid leak of chartables in PCRE2 -- gitgitgadget ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 1/3] grep: make PCRE1 aware of custom allocator 2019-10-16 12:10 ` [PATCH v2 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget @ 2019-10-16 12:10 ` Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 3/3] grep: avoid leak of chartables in PCRE2 Carlo Marcelo Arenas Belón via GitGitGadget 2 siblings, 0 replies; 42+ messages in thread From: Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-16 12:10 UTC (permalink / raw) To: git Cc: Carlo Marcelo Arenas Belón, Johannes Schindelin, Junio C Hamano, Carlo Marcelo Arenas Belón From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas@gmail.com> 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) didn't include a way to override the system alocator, and so it is incompatible with USE_NED_ALLOCATOR as reported by Dscho[1] (in similar code from PCRE2) Note that nedmalloc, as well as other custom allocators like jemalloc and mi-malloc, can be configured at runtime (via `LD_PRELOAD`), therefore we cannot know at compile time whether a custom allocator is used or not. Make the minimum change possible to ensure this combination is supported by extending `grep_init()` to set the PCRE1 specific functions to Git's idea of `malloc()` and `free()` and therefore making sure all allocations are done inside PCRE1 with the same allocator than the rest of Git. This change has negligible performance impact: PCRE needs to allocate memory once per program run for the character table and for each pattern compilation. These are both rare events compared to matching patterns against lines. Actual measurements[2] show that the impact is lost in the noise. [1] https://public-inbox.org/git/pull.306.git.gitgitgadget@gmail.com [2] https://public-inbox.org/git/7f42007f-911b-c570-17f6-1c6af0429586@web.de Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- grep.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/grep.c b/grep.c index cd952ef5d3..db6e0d895f 100644 --- a/grep.c +++ b/grep.c @@ -150,12 +150,20 @@ int grep_config(const char *var, const char *value, void *cb) * Initialize one instance of grep_opt and copy the * default values from the template we read the configuration * information in an earlier call to git_config(grep_config). + * + * If using PCRE, make sure that the library is configured + * to use the same allocator as Git (e.g. nedmalloc on Windows). */ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix) { struct grep_opt *def = &grep_defaults; int i; +#ifdef USE_LIBPCRE1 + pcre_malloc = malloc; + pcre_free = free; +#endif + memset(opt, 0, sizeof(*opt)); opt->repo = repo; opt->prefix = prefix; -- gitgitgadget ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v2 2/3] grep: make PCRE2 aware of custom allocator 2019-10-16 12:10 ` [PATCH v2 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-16 12:10 ` Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-18 1:38 ` Junio C Hamano ` (11 more replies) 2019-10-16 12:10 ` [PATCH v2 3/3] grep: avoid leak of chartables in PCRE2 Carlo Marcelo Arenas Belón via GitGitGadget 2 siblings, 12 replies; 42+ messages in thread From: Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-16 12:10 UTC (permalink / raw) To: git Cc: Carlo Marcelo Arenas Belón, Johannes Schindelin, Junio C Hamano, Carlo Marcelo Arenas Belón From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas@gmail.com> 94da9193a6 (grep: add support for PCRE v2, 2017-06-01) didn't include a way to override the system allocator, and so it is incompatible with custom allocators (e.g. nedmalloc). This problem became obvious when we tried to plug a memory leak by `free()`ing a data structure allocated by PCRE2, triggering a segfault in Windows (where we use nedmalloc by default). PCRE2 requires the use of a general context to override the allocator and therefore, there is a lot more code needed than in PCRE1, including a couple of wrapper functions. Extend the grep API with a "destructor" that could be called to cleanup any objects that were created and used globally. Update `builtin/grep.c` to use that new API, but any other future users should make sure to have matching `grep_init()`/`grep_destroy()` calls if they are using the pattern matching functionality. Move some of the logic that was before done per thread (in the workers) into an earlier phase to avoid degrading performance, but as the use of PCRE2 with custom allocators is better understood it is expected more of its functions will be instructed to use the custom allocator as well as was done in the original code[1] this work was based on. [1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- builtin/grep.c | 1 + grep.c | 34 +++++++++++++++++++++++++++++++++- grep.h | 1 + 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/builtin/grep.c b/builtin/grep.c index 560051784e..e49c20df60 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1145,5 +1145,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) run_pager(&opt, prefix); clear_pathspec(&pathspec); free_grep_patterns(&opt); + grep_destroy(); return !hit; } diff --git a/grep.c b/grep.c index db6e0d895f..296edbb56f 100644 --- a/grep.c +++ b/grep.c @@ -16,6 +16,20 @@ static int grep_source_is_binary(struct grep_source *gs, static struct grep_opt grep_defaults; +#ifdef USE_LIBPCRE2 +static pcre2_general_context *pcre2_global_context; + +static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) +{ + return malloc(size); +} + +static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) +{ + return free(pointer); +} +#endif + static const char *color_grep_slots[] = { [GREP_COLOR_CONTEXT] = "context", [GREP_COLOR_FILENAME] = "filename", @@ -153,12 +167,20 @@ int grep_config(const char *var, const char *value, void *cb) * * If using PCRE, make sure that the library is configured * to use the same allocator as Git (e.g. nedmalloc on Windows). + * + * Any allocated memory needs to be released in grep_destroy(). */ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix) { struct grep_opt *def = &grep_defaults; int i; +#if defined(USE_LIBPCRE2) + if (!pcre2_global_context) + pcre2_global_context = pcre2_general_context_create( + pcre2_malloc, pcre2_free, NULL); +#endif + #ifdef USE_LIBPCRE1 pcre_malloc = malloc; pcre_free = free; @@ -186,6 +208,13 @@ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix color_set(opt->colors[i], def->colors[i]); } +void grep_destroy(void) +{ +#ifdef USE_LIBPCRE2 + pcre2_general_context_free(pcre2_global_context); +#endif +} + static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt) { /* @@ -505,9 +534,12 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_compile_context = NULL; + /* pcre2_global_context is initialized in append_grep_pattern */ if (opt->ignore_case) { if (has_non_ascii(p->pattern)) { - character_tables = pcre2_maketables(NULL); + if (!pcre2_global_context) + BUG("pcre2_global_context uninitialized"); + character_tables = pcre2_maketables(pcre2_global_context); p->pcre2_compile_context = pcre2_compile_context_create(NULL); pcre2_set_character_tables(p->pcre2_compile_context, character_tables); } diff --git a/grep.h b/grep.h index 1875880f37..526c2db9ef 100644 --- a/grep.h +++ b/grep.h @@ -189,6 +189,7 @@ struct grep_opt { void init_grep_defaults(struct repository *); int grep_config(const char *var, const char *value, void *); void grep_init(struct grep_opt *, struct repository *repo, const char *prefix); +void grep_destroy(void); void grep_commit_pattern_type(enum grep_pattern_type, struct grep_opt *opt); void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t); -- gitgitgadget ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v2 2/3] grep: make PCRE2 aware of custom allocator 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-18 1:38 ` Junio C Hamano 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (10 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2019-10-18 1:38 UTC (permalink / raw) To: Carlo Marcelo Arenas Belón via GitGitGadget Cc: git, Carlo Marcelo Arenas Belón, Johannes Schindelin "Carlo Marcelo Arenas Belón via GitGitGadget" <gitgitgadget@gmail.com> writes: > builtin/grep.c | 1 + > grep.c | 34 +++++++++++++++++++++++++++++++++- > grep.h | 1 + > 3 files changed, 35 insertions(+), 1 deletion(-) > > +#if defined(USE_LIBPCRE2) > + if (!pcre2_global_context) > + pcre2_global_context = pcre2_general_context_create( > + pcre2_malloc, pcre2_free, NULL); > +#endif This part should use the same "#ifdef" as the other one which is a minor nit, for consistency. I do not care too deeply which way we unify, but we should stick to one. > + > #ifdef USE_LIBPCRE1 > pcre_malloc = malloc; > pcre_free = free; Other than that, all 3 patches look sensible, and they certainly got simplified by dropping the conditional ;-). ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 00/10] grep/pcre2: memory allocation fixes 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-18 1:38 ` Junio C Hamano @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-10 21:34 ` Junio C Hamano ` (11 more replies) 2021-02-04 21:05 ` [PATCH 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason ` (9 subsequent siblings) 11 siblings, 12 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason This series fixes up bugs in our PCRE v2 wrapper code and how it handles malloc()/free(). Junio: I'm splitting this off my recently sent 25 patch series, which I should probably have sent as an RFC: https://lore.kernel.org/git/20210203032811.14979-1-avarab@gmail.com/ It's on top of "next", as it would otherwise conflict with my in-flight ab/grep-pcre-invalid-utf8, ab/lose-grep-debug and ab/retire-pcre1. 06/10 is a follow-up improvement (not a fix, the in-flight works fine too) for ab/grep-pcre-invalid-utf8. The latter two just touch adjacent lines of code. There's no notable new behavior here, just cleanup of existing functionality. In mid-2019 there was a lot of discussion around the code being fixed here: https://lore.kernel.org/git/pull.306.git.gitgitgadget@gmail.com/#t https://lore.kernel.org/git/pull.402.git.1571227613.gitgitgadget@gmail.com/ As discussed in 08/10 I believe that fix was so difficult to get right because it was starting out with a fundamentally incorrect assumption about how PCRE v2's memory handling works. With 08-10/10 we end up with a much easier to reason about end-state, as the API itself is actually quite simple. Ævar Arnfjörð Bjarmason (10): grep/pcre2: drop needless assignment + assert() on opt->pcre2 grep/pcre2: drop needless assignment to NULL grep/pcre2: correct reference to grep_init() in comment grep/pcre2: prepare to add debugging to pcre2_malloc() grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode grep/pcre2: use compile-time PCREv2 version test grep/pcre2: use pcre2_maketables_free() function grep/pcre2: actually make pcre2 use custom allocator grep/pcre2: move back to thread-only PCREv2 structures grep/pcre2: move definitions of pcre2_{malloc,free} builtin/grep.c | 1 - grep.c | 99 ++++++++++++++++++++++---------------------------- grep.h | 9 ++++- 3 files changed, 51 insertions(+), 58 deletions(-) -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 00/10] grep/pcre2: memory allocation fixes 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason @ 2021-02-10 21:34 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 " Ævar Arnfjörð Bjarmason ` (10 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2021-02-10 21:34 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > This series fixes up bugs in our PCRE v2 wrapper code and how it > handles malloc()/free(). > > Junio: I'm splitting this off my recently sent 25 patch series, which > I should probably have sent as an RFC: > https://lore.kernel.org/git/20210203032811.14979-1-avarab@gmail.com/ > > It's on top of "next", as it would otherwise conflict with my > in-flight ab/grep-pcre-invalid-utf8, ab/lose-grep-debug and ab/retire-pcre1. These three seem to be solid and I am planning to merge them down to 'master' soonish. I was hoping that the series would get enough reviews by the time it happens so that I can expect an update that cleanly applies on top of 'master', and the plan seems to be working well ;-) ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 00/10] grep/pcre2: memory allocation fixes 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason 2021-02-10 21:34 ` Junio C Hamano @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-03-04 0:34 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason ` (9 subsequent siblings) 11 siblings, 1 reply; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Now based on "master", when I sent v1[1] it dependen on other in-flight topics of mine. Addresses minor issues with the commit messages raised by Johannes on v1, and other commit message issues I noticed myself on re-reading this. 1. https://lore.kernel.org/git/20210204210556.25242-1-avarab@gmail.com/ Ævar Arnfjörð Bjarmason (10): grep/pcre2: drop needless assignment + assert() on opt->pcre2 grep/pcre2: drop needless assignment to NULL grep/pcre2: correct reference to grep_init() in comment grep/pcre2: prepare to add debugging to pcre2_malloc() grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode grep/pcre2: use compile-time PCREv2 version test grep/pcre2: use pcre2_maketables_free() function grep/pcre2: actually make pcre2 use custom allocator grep/pcre2: move back to thread-only PCREv2 structures grep/pcre2: move definitions of pcre2_{malloc,free} builtin/grep.c | 1 - grep.c | 99 ++++++++++++++++++++++---------------------------- grep.h | 9 ++++- 3 files changed, 51 insertions(+), 58 deletions(-) Range-diff: 1: a11f1e91552 ! 1: 40d2e47c540 grep/pcre2: drop needless assignment + assert() on opt->pcre2 @@ Commit message There was never a good reason for this, it's just a relic from when I initially wrote the PCREv2 support. We're not going to have confusion about compile_pcre2_pattern() being called when it shouldn't just - because we forgot to cargo-cult this opt->pcre2 option, and "opt" - is (mostly) used for the options the user supplied, let's avoid the - pattern of needlessly assigning to it. + because we forgot to cargo-cult this opt->pcre2 option. - With my in-flight removal of PCREv1 [1] ("Remove support for v1 of the - PCRE library", 2021-01-24) there'll be even less confusion around what - we call where in these codepaths, which is one more reason to remove - this. + Furthermore the "struct grep_opt" is (mostly) used for the options the + user supplied, let's avoid the pattern of needlessly assigning to it. - 1. https://lore.kernel.org/git/xmqqmtwy29x8.fsf@gitster.c.googlers.com/ + With my recent removal of the PCREv1 backend in 7599730b7e2 (Remove + support for v1 of the PCRE library, 2021-01-24) there's even less + confusion around what we call where in these codepaths, which is one + more reason to remove this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> 2: db0ef9189e3 ! 2: e02f9b5ab50 grep/pcre2: drop needless assignment to NULL @@ Commit message grep/pcre2: drop needless assignment to NULL Remove a redundant assignment of pcre2_compile_context dating back to - my 94da9193a6e (grep: add support for PCRE v2, 2017-06-01). In - create_grep_pat() we xcalloc() the "grep_pat" struct, so there's no + my 94da9193a6e (grep: add support for PCRE v2, 2017-06-01). + + In create_grep_pat() we xcalloc() the "grep_pat" struct, so there's no need to NULL out individual members here. I think this was probably something left over from an earlier 3: 9c5429f4d96 = 3: 2bcb6c53589 grep/pcre2: correct reference to grep_init() in comment 4: e5558f5f0f1 ! 4: 78477193cd4 grep/pcre2: prepare to add debugging to pcre2_malloc() @@ Commit message grep/pcre2: prepare to add debugging to pcre2_malloc() Change pcre2_malloc() in a way that'll make it easier for a debugging - fprintf() to spew out the allocated pointer. This doesn't introduce - any functional change, it just makes a subsequent commit's diff easier - to read. Changes code added in 513f2b0bbd4 (grep: make PCRE2 aware of - custom allocator, 2019-10-16). + fprintf() to spew out the allocated pointer. + + This doesn't introduce any functional change, it just makes a + subsequent commit's diff easier to read. Changes code added in + 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> 5: 7968effaa8a ! 5: cbeb521bd65 grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode @@ Commit message grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Add optional printing of PCREv2 allocations to stderr for a developer - who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to - "1". + who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to "1". + + You need to manually change the definition in the source file similar + to the DEBUG_MAILMAP, there's no Makefile knob for this. This will be referenced a subsequent commit, and is generally useful to manually see what's going on with PCREv2 allocations while working 6: 604e183c224 = 6: 788929c3de6 grep/pcre2: use compile-time PCREv2 version test 7: f864a9aac4c ! 7: 6a4552c6d4e grep/pcre2: use pcre2_maketables_free() function @@ Commit message grep/pcre2: use pcre2_maketables_free() function Make use of the pcre2_maketables_free() function to free the memory - allocated by pcre2_maketables(). At first sight it's strange that - 10da030ab75 (grep: avoid leak of chartables in PCRE2, 2019-10-16) - which added the free() call here doesn't make use of the pcre2_free() - the author introduced in the preceding commit in 513f2b0bbd4 (grep: - make PCRE2 aware of custom allocator, 2019-10-16). + allocated by pcre2_maketables(). + + At first sight it's strange that 10da030ab75 (grep: avoid leak of + chartables in PCRE2, 2019-10-16) which added the free() call here + doesn't make use of the pcre2_free() the author introduced in the + preceding commit in 513f2b0bbd4 (grep: make PCRE2 aware of custom + allocator, 2019-10-16). The reason is that at the time the function didn't exist. It was first introduced in PCREv2 version 10.34, released on 2019-11-21. Let's make use of it behind a macro. I don't think this matters for anything to do with custom allocators, but it makes our use of PCREv2 - more discoverable. At some distant point in the future we'll be able - to drop the version guard, as nobody will be running a version older - than 10.34. + more discoverable. + + At some distant point in the future we'll be able to drop the version + guard, as nobody will be running a version older than 10.34. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> 8: cea9e066951 ! 8: bf58d597a4b grep/pcre2: actually make pcre2 use custom allocator @@ Commit message functions for allocation. We'll now use it for all PCREv2 allocations. The reason 513f2b0bbd4 worked as a bugfix for the USE_NED_ALLOCATOR - issue is because it managed to target pretty much the allocation freed - via free(), as opposed to by a pcre2_*free() function. I.e. the - pcre2_maketables() and pcre2_maketables_free() pair. For most of the - rest we continued allocating with stock malloc() inside PCREv2 itself, - but didn't segfault because we'd use its corresponding free(). + issue is because it targeted the allocation freed via free(), as + opposed to by a pcre2_*free() function. I.e. the pcre2_maketables() + and pcre2_maketables_free() pair. + + For most of the rest we continued allocating with stock malloc() + inside PCREv2 itself, but didn't segfault because we'd use its + corresponding free(). In a preceding commit of mine I changed the free() to pcre2_maketables_free() on versions of PCREv2 10.34 and newer. So as @@ Commit message grep --threads=1 -iP æ.*var.*xyz Only use pcre2_{malloc,free}() for 2 malloc() calls and 2 - corresponding free() call. Now it's 12 calls to each. This can be + corresponding free() calls. Now it's 12 calls to each. This can be observed with the GREP_PCRE2_DEBUG_MALLOC debug mode. Reading the history of how this bug got introduced it wasn't present @@ Commit message Thus the failed attempts to go down the route of only creating the general context in cases where we ourselves call pcre2_maketables(), before finally settling on the approach 513f2b0bbd4 took of always - creating it. + creating it, but then mostly not using it. Instead we should always create it, and then pass the general context to those functions that accept it, so that they'll consistently use our preferred memory allocation functions. - 1. https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ + 1. https://lore.kernel.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ 2. https://lore.kernel.org/git/CAPUEsphMh_ZqcH3M7PXC9jHTfEdQN3mhTAK2JDkdvKBp53YBoA@mail.gmail.com/ 3. https://lore.kernel.org/git/20190806085014.47776-3-carenas@gmail.com/ 9: 99eaa918261 ! 9: 129adf872fb grep/pcre2: move back to thread-only PCREv2 structures @@ Commit message grep/pcre2: move back to thread-only PCREv2 structures Change the setup of the "pcre2_general_context" to happen per-thread - in compile_pcre2_pattern() instead of in grep_init(), as happens with - all the rest of the pcre2_* members of the grep_pat structure. + in compile_pcre2_pattern() instead of in grep_init(). + + This change brings it in line with how the rest of the pcre2_* members + in the grep_pat structure are set up. As noted in the preceding commit the approach 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16) took to allocate the pcre2_general_context seems to have been initially based on a misunderstanding of how PCREv2 memory allocation works. - This approach of creating a global context is just added complexity - for almost zero gain. On my system it's 24 bytes saved per-thread, for - context PCREv2 will then go on to some kilobytes for its own - thread-local state. + The approach of creating a global context in grep_init() is just added + complexity for almost zero gain. On my system it's 24 bytes saved + per-thread. For comparison PCREv2 will then go on to allocate at least + a kilobyte for its own thread-local state. As noted in 6d423dd542f (grep: don't redundantly compile throwaway patterns under threading, 2017-05-25) the grep code is intentionally @@ Commit message structures globally, while making others thread-local. So let's remove this special case and make all of them thread-local - for simplicity again. + again for simplicity. With this change we could move the + pcre2_{malloc,free} functions around to live closer to their current + use. I'm not doing that here to keep this change small, that cleanup + will be done in a follow-up commit. See also the discussion in 94da9193a6 (grep: add support for PCRE v2, 2017-06-01) about thread safety, and Johannes's comments[1] to the 10: 462f12126c8 ! 10: 688d1b00d5d grep/pcre2: move definitions of pcre2_{malloc,free} @@ Commit message grep/pcre2: move definitions of pcre2_{malloc,free} Move the definitions of the pcre2_{malloc,free} functions above the - compile_pcre2_pattern() function they're used it. Before the preceding - commit they used to be needed earlier, but now we can move them to be - adjacent to the other PCREv2 functions. + compile_pcre2_pattern() function they're used in. + + Before the preceding commit they used to be needed earlier, but now we + can move them to be adjacent to the other PCREv2 functions. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 00/10] grep/pcre2: memory allocation fixes 2021-02-18 0:07 ` [PATCH v2 " Ævar Arnfjörð Bjarmason @ 2021-03-04 0:34 ` Junio C Hamano 0 siblings, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2021-03-04 0:34 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Now based on "master", when I sent v1[1] it dependen on other > in-flight topics of mine. This has been in "Needs review" state for too long, so I looked at them myself. Aside from a few minor issues, they all made sense to me. Thanks. ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason 2021-02-10 21:34 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 " Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 02/10] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason ` (8 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Drop an assignment added in b65abcafc7a (grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) and the overly cautious assert() I added in 94da9193a6e (grep: add support for PCRE v2, 2017-06-01). There was never a good reason for this, it's just a relic from when I initially wrote the PCREv2 support. We're not going to have confusion about compile_pcre2_pattern() being called when it shouldn't just because we forgot to cargo-cult this opt->pcre2 option. Furthermore the "struct grep_opt" is (mostly) used for the options the user supplied, let's avoid the pattern of needlessly assigning to it. With my recent removal of the PCREv1 backend in 7599730b7e2 (Remove support for v1 of the PCRE library, 2021-01-24) there's even less confusion around what we call where in these codepaths, which is one more reason to remove this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/grep.c b/grep.c index aabfaaa4c32..816e23f17ef 100644 --- a/grep.c +++ b/grep.c @@ -373,8 +373,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt int patinforet; size_t jitsizearg; - assert(opt->pcre2); - p->pcre2_compile_context = NULL; /* pcre2_global_context is initialized in append_grep_pattern */ @@ -555,7 +553,6 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) #endif if (p->fixed || p->is_fixed) { #ifdef USE_LIBPCRE2 - opt->pcre2 = 1; if (p->is_fixed) { compile_pcre2_pattern(p, opt); } else { -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v2 02/10] grep/pcre2: drop needless assignment to NULL 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (2 preceding siblings ...) 2021-02-18 0:07 ` [PATCH v2 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 03/10] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason ` (7 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Remove a redundant assignment of pcre2_compile_context dating back to my 94da9193a6e (grep: add support for PCRE v2, 2017-06-01). In create_grep_pat() we xcalloc() the "grep_pat" struct, so there's no need to NULL out individual members here. I think this was probably something left over from an earlier development version of mine. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/grep.c b/grep.c index 816e23f17ef..f27c5de7f56 100644 --- a/grep.c +++ b/grep.c @@ -373,8 +373,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt int patinforet; size_t jitsizearg; - p->pcre2_compile_context = NULL; - /* pcre2_global_context is initialized in append_grep_pattern */ if (opt->ignore_case) { if (!opt->ignore_locale && has_non_ascii(p->pattern)) { -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v2 03/10] grep/pcre2: correct reference to grep_init() in comment 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (3 preceding siblings ...) 2021-02-18 0:07 ` [PATCH v2 02/10] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason ` (6 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Correct a comment added in 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16). This comment was never correct in git.git, but was consistent with an older version of the patch[1]. 1. https://lore.kernel.org/git/20190806163658.66932-3-carenas@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/grep.c b/grep.c index f27c5de7f56..b9adcd83e7a 100644 --- a/grep.c +++ b/grep.c @@ -373,7 +373,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt int patinforet; size_t jitsizearg; - /* pcre2_global_context is initialized in append_grep_pattern */ + /* pcre2_global_context is initialized in grep_init */ if (opt->ignore_case) { if (!opt->ignore_locale && has_non_ascii(p->pattern)) { if (!pcre2_global_context) -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v2 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (4 preceding siblings ...) 2021-02-18 0:07 ` [PATCH v2 03/10] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason ` (5 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Change pcre2_malloc() in a way that'll make it easier for a debugging fprintf() to spew out the allocated pointer. This doesn't introduce any functional change, it just makes a subsequent commit's diff easier to read. Changes code added in 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/grep.c b/grep.c index b9adcd83e7a..f96d86c9293 100644 --- a/grep.c +++ b/grep.c @@ -45,7 +45,8 @@ static pcre2_general_context *pcre2_global_context; static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) { - return malloc(size); + void *pointer = malloc(size); + return pointer; } static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v2 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (5 preceding siblings ...) 2021-02-18 0:07 ` [PATCH v2 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 06/10] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason ` (4 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Add optional printing of PCREv2 allocations to stderr for a developer who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to "1". You need to manually change the definition in the source file similar to the DEBUG_MAILMAP, there's no Makefile knob for this. This will be referenced a subsequent commit, and is generally useful to manually see what's going on with PCREv2 allocations while working on that code. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/grep.c b/grep.c index f96d86c9293..7d262a23d88 100644 --- a/grep.c +++ b/grep.c @@ -42,15 +42,25 @@ static struct grep_opt grep_defaults = { #ifdef USE_LIBPCRE2 static pcre2_general_context *pcre2_global_context; +#define GREP_PCRE2_DEBUG_MALLOC 0 static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) { void *pointer = malloc(size); +#if GREP_PCRE2_DEBUG_MALLOC + static int count = 1; + fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size); +#endif return pointer; } static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) { +#if GREP_PCRE2_DEBUG_MALLOC + static int count = 1; + if (pointer) + fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++); +#endif free(pointer); } #endif -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v2 06/10] grep/pcre2: use compile-time PCREv2 version test 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (6 preceding siblings ...) 2021-02-18 0:07 ` [PATCH v2 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-03-04 0:14 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason ` (3 subsequent siblings) 11 siblings, 1 reply; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Replace a use of pcre2_config(PCRE2_CONFIG_VERSION, ...) which I added in 95ca1f987ed (grep/pcre2: better support invalid UTF-8 haystacks, 2021-01-24) with the same test done at compile-time. It might be cuter to do this at runtime since we don't have to do the "major >= 11 || (major >= 10 && ...)" test. But in the next commit we'll add another version comparison that absolutely needs to be done at compile-time, so we're better of being consistent across the board. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 18 ++++-------------- grep.h | 3 +++ 2 files changed, 7 insertions(+), 14 deletions(-) diff --git a/grep.c b/grep.c index 7d262a23d88..e58044474dc 100644 --- a/grep.c +++ b/grep.c @@ -400,21 +400,11 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt !(!opt->ignore_case && (p->fixed || p->is_fixed))) options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF); +#ifdef GIT_PCRE2_VERSION_10_36_OR_HIGHER /* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */ - if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS)) { - struct strbuf buf; - int len; - int err; - - if ((len = pcre2_config(PCRE2_CONFIG_VERSION, NULL)) < 0) - BUG("pcre2_config(..., NULL) failed: %d", len); - strbuf_init(&buf, len + 1); - if ((err = pcre2_config(PCRE2_CONFIG_VERSION, buf.buf)) < 0) - BUG("pcre2_config(..., buf.buf) failed: %d", err); - if (versioncmp(buf.buf, "10.36") < 0) - options |= PCRE2_NO_START_OPTIMIZE; - strbuf_release(&buf); - } + if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS)) + options |= PCRE2_NO_START_OPTIMIZE; +#endif p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern, p->patternlen, options, &error, &erroffset, diff --git a/grep.h b/grep.h index ae89d6254b3..54e52042cb9 100644 --- a/grep.h +++ b/grep.h @@ -4,6 +4,9 @@ #ifdef USE_LIBPCRE2 #define PCRE2_CODE_UNIT_WIDTH 8 #include <pcre2.h> +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 +#define GIT_PCRE2_VERSION_10_36_OR_HIGHER +#endif #else typedef int pcre2_code; typedef int pcre2_match_data; -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v2 06/10] grep/pcre2: use compile-time PCREv2 version test 2021-02-18 0:07 ` [PATCH v2 06/10] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason @ 2021-03-04 0:14 ` Junio C Hamano 0 siblings, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2021-03-04 0:14 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 > +#define GIT_PCRE2_VERSION_10_36_OR_HIGHER > +#endif This is not wrong per-se but it does look misleading not to spell it as #if (PCRE2_MAJOR == 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 which would convey the intention much clearly. Hint to notice the difference: imagine if it were "9.37 and later in 9.X series, 10.36 and later in 10.X series and anything after 11 are OK". In other words, the minor version is always tied to a particular major version and "major >= X && minor >= Y" is often a bug, even though in this case it happens to be OK only because 10 and 11 are consecutive. ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 07/10] grep/pcre2: use pcre2_maketables_free() function 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (7 preceding siblings ...) 2021-02-18 0:07 ` [PATCH v2 06/10] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason ` (2 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Make use of the pcre2_maketables_free() function to free the memory allocated by pcre2_maketables(). At first sight it's strange that 10da030ab75 (grep: avoid leak of chartables in PCRE2, 2019-10-16) which added the free() call here doesn't make use of the pcre2_free() the author introduced in the preceding commit in 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16). The reason is that at the time the function didn't exist. It was first introduced in PCREv2 version 10.34, released on 2019-11-21. Let's make use of it behind a macro. I don't think this matters for anything to do with custom allocators, but it makes our use of PCREv2 more discoverable. At some distant point in the future we'll be able to drop the version guard, as nobody will be running a version older than 10.34. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 4 ++++ grep.h | 3 +++ 2 files changed, 7 insertions(+) diff --git a/grep.c b/grep.c index e58044474dc..c63dbff4b24 100644 --- a/grep.c +++ b/grep.c @@ -490,7 +490,11 @@ static void free_pcre2_pattern(struct grep_pat *p) pcre2_compile_context_free(p->pcre2_compile_context); pcre2_code_free(p->pcre2_pattern); pcre2_match_data_free(p->pcre2_match_data); +#ifdef GIT_PCRE2_VERSION_10_34_OR_HIGHER + pcre2_maketables_free(pcre2_global_context, p->pcre2_tables); +#else free((void *)p->pcre2_tables); +#endif } #else /* !USE_LIBPCRE2 */ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) diff --git a/grep.h b/grep.h index 54e52042cb9..64666e92049 100644 --- a/grep.h +++ b/grep.h @@ -7,6 +7,9 @@ #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 #define GIT_PCRE2_VERSION_10_36_OR_HIGHER #endif +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11 +#define GIT_PCRE2_VERSION_10_34_OR_HIGHER +#endif #else typedef int pcre2_code; typedef int pcre2_match_data; -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v2 08/10] grep/pcre2: actually make pcre2 use custom allocator 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (8 preceding siblings ...) 2021-02-18 0:07 ` [PATCH v2 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-03-04 0:24 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 09/10] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason 11 siblings, 1 reply; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Continue work started in 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16) and make PCREv2 use our pcre2_{malloc,free}(). functions for allocation. We'll now use it for all PCREv2 allocations. The reason 513f2b0bbd4 worked as a bugfix for the USE_NED_ALLOCATOR issue is because it targeted the allocation freed via free(), as opposed to by a pcre2_*free() function. I.e. the pcre2_maketables() and pcre2_maketables_free() pair. For most of the rest we continued allocating with stock malloc() inside PCREv2 itself, but didn't segfault because we'd use its corresponding free(). In a preceding commit of mine I changed the free() to pcre2_maketables_free() on versions of PCREv2 10.34 and newer. So as far as fixing the segfault goes we could revert 513f2b0bbd4. But then we wouldn't use the desired allocator, let's just use it instead. Before this patch we'd on e.g.: grep --threads=1 -iP æ.*var.*xyz Only use pcre2_{malloc,free}() for 2 malloc() calls and 2 corresponding free() calls. Now it's 12 calls to each. This can be observed with the GREP_PCRE2_DEBUG_MALLOC debug mode. Reading the history of how this bug got introduced it wasn't present in Johannes's original patch[1] to fix the issue. My reading of that thread is that the approach the follow-up patches to Johannes's original pursued were based on misunderstanding of how the PCREv2 API works. In particular this part of [2]: "most of the time (like when using UTF-8) the chartable (and therefore the global context) is not needed (even when using alternate allocators)" That's simply not how PCREv2 memory allocation works. It's easy to see how the misunderstanding came about. It's because (as noted above) the issue was noticed because of our use of free() in our own grep.c for freeing the memory allocated by pcre2_maketables(). Thus the misunderstanding that PCREv2's compile context is something only needed for pcre2_maketables(), and e.g. an aborted earlier attempt[3] to only set it up when we ourselves called pcre2_maketables(). That's not what PCREv2's compile context is. To quote PCREv2's documentation: "This context just contains pointers to (and data for) external memory management functions that are called from several places in the PCRE2 library." Thus the failed attempts to go down the route of only creating the general context in cases where we ourselves call pcre2_maketables(), before finally settling on the approach 513f2b0bbd4 took of always creating it, but then mostly not using it. Instead we should always create it, and then pass the general context to those functions that accept it, so that they'll consistently use our preferred memory allocation functions. 1. https://lore.kernel.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ 2. https://lore.kernel.org/git/CAPUEsphMh_ZqcH3M7PXC9jHTfEdQN3mhTAK2JDkdvKBp53YBoA@mail.gmail.com/ 3. https://lore.kernel.org/git/20190806085014.47776-3-carenas@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/grep.c b/grep.c index c63dbff4b24..0116ff5f09f 100644 --- a/grep.c +++ b/grep.c @@ -390,7 +390,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt if (!pcre2_global_context) BUG("pcre2_global_context uninitialized"); p->pcre2_tables = pcre2_maketables(pcre2_global_context); - p->pcre2_compile_context = pcre2_compile_context_create(NULL); + p->pcre2_compile_context = pcre2_compile_context_create(pcre2_global_context); pcre2_set_character_tables(p->pcre2_compile_context, p->pcre2_tables); } @@ -411,7 +411,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_compile_context); if (p->pcre2_pattern) { - p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL); + p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, pcre2_global_context); if (!p->pcre2_match_data) die("Couldn't allocate PCRE2 match data"); } else { -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v2 08/10] grep/pcre2: actually make pcre2 use custom allocator 2021-02-18 0:07 ` [PATCH v2 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason @ 2021-03-04 0:24 ` Junio C Hamano 0 siblings, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2021-03-04 0:24 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Continue work started in 513f2b0bbd4 (grep: make PCRE2 aware of custom > allocator, 2019-10-16) and make PCREv2 use our pcre2_{malloc,free}(). > functions for allocation. We'll now use it for all PCREv2 allocations. > > The reason 513f2b0bbd4 worked as a bugfix for the USE_NED_ALLOCATOR > issue is because it targeted the allocation freed via free(), as > opposed to by a pcre2_*free() function. I.e. the pcre2_maketables() > and pcre2_maketables_free() pair. > > For most of the rest we continued allocating with stock malloc() > inside PCREv2 itself, but didn't segfault because we'd use its > corresponding free(). > > In a preceding commit of mine I changed the free() to > pcre2_maketables_free() on versions of PCREv2 10.34 and newer. So as > far as fixing the segfault goes ... Wait, wait. So, because of the previous step, we would start segfaulting and we need to fix that breakage, which is the reason why this commit exists? If so, ... > we could revert 513f2b0bbd4. But then > we wouldn't use the desired allocator, let's just use it instead. ... I agree with the conclusion that both the previous step and this step are needed and better than a reversion of 513f2b0b (grep: make PCRE2 aware of custom allocator, 2019-10-16) and the previou step. But even then, it feels somewhat backwards. Shouldn't this step come first, so that we would be using a matching alloc/free pair, and then do the previous step? > Instead we should always create it, and then pass the general context > to those functions that accept it, so that they'll consistently use > our preferred memory allocation functions. Thanks. ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 09/10] grep/pcre2: move back to thread-only PCREv2 structures 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (9 preceding siblings ...) 2021-02-18 0:07 ` [PATCH v2 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-03-04 0:27 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason 11 siblings, 1 reply; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Change the setup of the "pcre2_general_context" to happen per-thread in compile_pcre2_pattern() instead of in grep_init(). This change brings it in line with how the rest of the pcre2_* members in the grep_pat structure are set up. As noted in the preceding commit the approach 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16) took to allocate the pcre2_general_context seems to have been initially based on a misunderstanding of how PCREv2 memory allocation works. The approach of creating a global context in grep_init() is just added complexity for almost zero gain. On my system it's 24 bytes saved per-thread. For comparison PCREv2 will then go on to allocate at least a kilobyte for its own thread-local state. As noted in 6d423dd542f (grep: don't redundantly compile throwaway patterns under threading, 2017-05-25) the grep code is intentionally not trying to micro-optimize allocations by e.g. sharing some PCREv2 structures globally, while making others thread-local. So let's remove this special case and make all of them thread-local again for simplicity. With this change we could move the pcre2_{malloc,free} functions around to live closer to their current use. I'm not doing that here to keep this change small, that cleanup will be done in a follow-up commit. See also the discussion in 94da9193a6 (grep: add support for PCRE v2, 2017-06-01) about thread safety, and Johannes's comments[1] to the effect that we should be doing what this patch is doing. 1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.1908052120302.46@tvgsbejvaqbjf.bet/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/grep.c | 1 - grep.c | 41 +++++++++++++++-------------------------- grep.h | 3 ++- 3 files changed, 17 insertions(+), 28 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 55d06c95130..c69fe993406 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1175,6 +1175,5 @@ int cmd_grep(int argc, const char **argv, const char *prefix) run_pager(&opt, prefix); clear_pathspec(&pathspec); free_grep_patterns(&opt); - grep_destroy(); return !hit; } diff --git a/grep.c b/grep.c index 0116ff5f09f..2599f329cd6 100644 --- a/grep.c +++ b/grep.c @@ -41,7 +41,6 @@ static struct grep_opt grep_defaults = { }; #ifdef USE_LIBPCRE2 -static pcre2_general_context *pcre2_global_context; #define GREP_PCRE2_DEBUG_MALLOC 0 static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) @@ -163,20 +162,9 @@ int grep_config(const char *var, const char *value, void *cb) * Initialize one instance of grep_opt and copy the * default values from the template we read the configuration * information in an earlier call to git_config(grep_config). - * - * If using PCRE, make sure that the library is configured - * to use the same allocator as Git (e.g. nedmalloc on Windows). - * - * Any allocated memory needs to be released in grep_destroy(). */ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix) { -#if defined(USE_LIBPCRE2) - if (!pcre2_global_context) - pcre2_global_context = pcre2_general_context_create( - pcre2_malloc, pcre2_free, NULL); -#endif - *opt = grep_defaults; opt->repo = repo; @@ -186,13 +174,6 @@ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix opt->header_tail = &opt->header_list; } -void grep_destroy(void) -{ -#ifdef USE_LIBPCRE2 - pcre2_general_context_free(pcre2_global_context); -#endif -} - static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt) { /* @@ -384,13 +365,20 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt int patinforet; size_t jitsizearg; - /* pcre2_global_context is initialized in grep_init */ + /* + * Call pcre2_general_context_create() before calling any + * other pcre2_*(). It sets up our malloc()/free() functions + * with which everything else is allocated. + */ + p->pcre2_general_context = pcre2_general_context_create( + pcre2_malloc, pcre2_free, NULL); + if (!p->pcre2_general_context) + die("Couldn't allocate PCRE2 general context"); + if (opt->ignore_case) { if (!opt->ignore_locale && has_non_ascii(p->pattern)) { - if (!pcre2_global_context) - BUG("pcre2_global_context uninitialized"); - p->pcre2_tables = pcre2_maketables(pcre2_global_context); - p->pcre2_compile_context = pcre2_compile_context_create(pcre2_global_context); + p->pcre2_tables = pcre2_maketables(p->pcre2_general_context); + p->pcre2_compile_context = pcre2_compile_context_create(p->pcre2_general_context); pcre2_set_character_tables(p->pcre2_compile_context, p->pcre2_tables); } @@ -411,7 +399,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_compile_context); if (p->pcre2_pattern) { - p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, pcre2_global_context); + p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, p->pcre2_general_context); if (!p->pcre2_match_data) die("Couldn't allocate PCRE2 match data"); } else { @@ -491,10 +479,11 @@ static void free_pcre2_pattern(struct grep_pat *p) pcre2_code_free(p->pcre2_pattern); pcre2_match_data_free(p->pcre2_match_data); #ifdef GIT_PCRE2_VERSION_10_34_OR_HIGHER - pcre2_maketables_free(pcre2_global_context, p->pcre2_tables); + pcre2_maketables_free(p->pcre2_general_context, p->pcre2_tables); #else free((void *)p->pcre2_tables); #endif + pcre2_general_context_free(p->pcre2_general_context); } #else /* !USE_LIBPCRE2 */ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) diff --git a/grep.h b/grep.h index 64666e92049..72f82b1e302 100644 --- a/grep.h +++ b/grep.h @@ -14,6 +14,7 @@ typedef int pcre2_code; typedef int pcre2_match_data; typedef int pcre2_compile_context; +typedef int pcre2_general_context; #endif #ifndef PCRE2_MATCH_INVALID_UTF /* PCRE2_MATCH_* dummy also with !USE_LIBPCRE2, for test-pcre2-config.c */ @@ -75,6 +76,7 @@ struct grep_pat { pcre2_code *pcre2_pattern; pcre2_match_data *pcre2_match_data; pcre2_compile_context *pcre2_compile_context; + pcre2_general_context *pcre2_general_context; const uint8_t *pcre2_tables; uint32_t pcre2_jit_on; unsigned fixed:1; @@ -167,7 +169,6 @@ struct grep_opt { int grep_config(const char *var, const char *value, void *); void grep_init(struct grep_opt *, struct repository *repo, const char *prefix); -void grep_destroy(void); void grep_commit_pattern_type(enum grep_pattern_type, struct grep_opt *opt); void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t); -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v2 09/10] grep/pcre2: move back to thread-only PCREv2 structures 2021-02-18 0:07 ` [PATCH v2 09/10] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason @ 2021-03-04 0:27 ` Junio C Hamano 0 siblings, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2021-03-04 0:27 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > So let's remove this special case and make all of them thread-local > again for simplicity. With this change we could move the > pcre2_{malloc,free} functions around to live closer to their current > use. I'm not doing that here to keep this change small, that cleanup > will be done in a follow-up commit. Nice. ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason ` (10 preceding siblings ...) 2021-02-18 0:07 ` [PATCH v2 09/10] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason 2021-03-04 0:28 ` Junio C Hamano 11 siblings, 1 reply; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-18 0:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Move the definitions of the pcre2_{malloc,free} functions above the compile_pcre2_pattern() function they're used in. Before the preceding commit they used to be needed earlier, but now we can move them to be adjacent to the other PCREv2 functions. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 46 ++++++++++++++++++++++------------------------ 1 file changed, 22 insertions(+), 24 deletions(-) diff --git a/grep.c b/grep.c index 2599f329cd6..636ac48bf07 100644 --- a/grep.c +++ b/grep.c @@ -40,30 +40,6 @@ static struct grep_opt grep_defaults = { .output = std_output, }; -#ifdef USE_LIBPCRE2 -#define GREP_PCRE2_DEBUG_MALLOC 0 - -static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) -{ - void *pointer = malloc(size); -#if GREP_PCRE2_DEBUG_MALLOC - static int count = 1; - fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size); -#endif - return pointer; -} - -static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) -{ -#if GREP_PCRE2_DEBUG_MALLOC - static int count = 1; - if (pointer) - fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++); -#endif - free(pointer); -} -#endif - static const char *color_grep_slots[] = { [GREP_COLOR_CONTEXT] = "context", [GREP_COLOR_FILENAME] = "filename", @@ -355,6 +331,28 @@ static int is_fixed(const char *s, size_t len) } #ifdef USE_LIBPCRE2 +#define GREP_PCRE2_DEBUG_MALLOC 0 + +static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) +{ + void *pointer = malloc(size); +#if GREP_PCRE2_DEBUG_MALLOC + static int count = 1; + fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size); +#endif + return pointer; +} + +static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) +{ +#if GREP_PCRE2_DEBUG_MALLOC + static int count = 1; + if (pointer) + fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++); +#endif + free(pointer); +} + static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) { int error; -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v2 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} 2021-02-18 0:07 ` [PATCH v2 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason @ 2021-03-04 0:28 ` Junio C Hamano 0 siblings, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2021-03-04 0:28 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Move the definitions of the pcre2_{malloc,free} functions above the > compile_pcre2_pattern() function they're used in. > > Before the preceding commit they used to be needed earlier, but now we > can move them to be adjacent to the other PCREv2 functions. Yes, much nicer. ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-18 1:38 ` Junio C Hamano 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 02/10] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason ` (8 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Drop an assignment added in b65abcafc7a (grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) and the overly cautious assert() I added in 94da9193a6e (grep: add support for PCRE v2, 2017-06-01). There was never a good reason for this, it's just a relic from when I initially wrote the PCREv2 support. We're not going to have confusion about compile_pcre2_pattern() being called when it shouldn't just because we forgot to cargo-cult this opt->pcre2 option, and "opt" is (mostly) used for the options the user supplied, let's avoid the pattern of needlessly assigning to it. With my in-flight removal of PCREv1 [1] ("Remove support for v1 of the PCRE library", 2021-01-24) there'll be even less confusion around what we call where in these codepaths, which is one more reason to remove this. 1. https://lore.kernel.org/git/xmqqmtwy29x8.fsf@gitster.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/grep.c b/grep.c index aabfaaa4c3..816e23f17e 100644 --- a/grep.c +++ b/grep.c @@ -373,8 +373,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt int patinforet; size_t jitsizearg; - assert(opt->pcre2); - p->pcre2_compile_context = NULL; /* pcre2_global_context is initialized in append_grep_pattern */ @@ -555,7 +553,6 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) #endif if (p->fixed || p->is_fixed) { #ifdef USE_LIBPCRE2 - opt->pcre2 = 1; if (p->is_fixed) { compile_pcre2_pattern(p, opt); } else { -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 02/10] grep/pcre2: drop needless assignment to NULL 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (2 preceding siblings ...) 2021-02-04 21:05 ` [PATCH 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 03/10] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason ` (7 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Remove a redundant assignment of pcre2_compile_context dating back to my 94da9193a6e (grep: add support for PCRE v2, 2017-06-01). In create_grep_pat() we xcalloc() the "grep_pat" struct, so there's no need to NULL out individual members here. I think this was probably something left over from an earlier development version of mine. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/grep.c b/grep.c index 816e23f17e..f27c5de7f5 100644 --- a/grep.c +++ b/grep.c @@ -373,8 +373,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt int patinforet; size_t jitsizearg; - p->pcre2_compile_context = NULL; - /* pcre2_global_context is initialized in append_grep_pattern */ if (opt->ignore_case) { if (!opt->ignore_locale && has_non_ascii(p->pattern)) { -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 03/10] grep/pcre2: correct reference to grep_init() in comment 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (3 preceding siblings ...) 2021-02-04 21:05 ` [PATCH 02/10] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason ` (6 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Correct a comment added in 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16). This comment was never correct in git.git, but was consistent with an older version of the patch[1]. 1. https://lore.kernel.org/git/20190806163658.66932-3-carenas@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/grep.c b/grep.c index f27c5de7f5..b9adcd83e7 100644 --- a/grep.c +++ b/grep.c @@ -373,7 +373,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt int patinforet; size_t jitsizearg; - /* pcre2_global_context is initialized in append_grep_pattern */ + /* pcre2_global_context is initialized in grep_init */ if (opt->ignore_case) { if (!opt->ignore_locale && has_non_ascii(p->pattern)) { if (!pcre2_global_context) -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (4 preceding siblings ...) 2021-02-04 21:05 ` [PATCH 03/10] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason ` (5 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Change pcre2_malloc() in a way that'll make it easier for a debugging fprintf() to spew out the allocated pointer. This doesn't introduce any functional change, it just makes a subsequent commit's diff easier to read. Changes code added in 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/grep.c b/grep.c index b9adcd83e7..f96d86c929 100644 --- a/grep.c +++ b/grep.c @@ -45,7 +45,8 @@ static pcre2_general_context *pcre2_global_context; static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) { - return malloc(size); + void *pointer = malloc(size); + return pointer; } static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (5 preceding siblings ...) 2021-02-04 21:05 ` [PATCH 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-10 10:38 ` Johannes Schindelin 2021-02-04 21:05 ` [PATCH 06/10] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason ` (4 subsequent siblings) 11 siblings, 1 reply; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Add optional printing of PCREv2 allocations to stderr for a developer who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to "1". This will be referenced a subsequent commit, and is generally useful to manually see what's going on with PCREv2 allocations while working on that code. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/grep.c b/grep.c index f96d86c929..7d262a23d8 100644 --- a/grep.c +++ b/grep.c @@ -42,15 +42,25 @@ static struct grep_opt grep_defaults = { #ifdef USE_LIBPCRE2 static pcre2_general_context *pcre2_global_context; +#define GREP_PCRE2_DEBUG_MALLOC 0 static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) { void *pointer = malloc(size); +#if GREP_PCRE2_DEBUG_MALLOC + static int count = 1; + fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size); +#endif return pointer; } static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) { +#if GREP_PCRE2_DEBUG_MALLOC + static int count = 1; + if (pointer) + fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++); +#endif free(pointer); } #endif -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode 2021-02-04 21:05 ` [PATCH 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason @ 2021-02-10 10:38 ` Johannes Schindelin 0 siblings, 0 replies; 42+ messages in thread From: Johannes Schindelin @ 2021-02-10 10:38 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Jeff King, Carlo Marcelo Arenas Belón [-- Attachment #1: Type: text/plain, Size: 1677 bytes --] Hi Ævar, On Thu, 4 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > Add optional printing of PCREv2 allocations to stderr for a developer > who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to > "1". Maybe clarify in the oneline that this is not an environment variable, but a Makefile knob? I had to read all the way to the diff to understand that aspect. Otherwise, the patch series so far looks really fine to me. Thanks, Dscho > > This will be referenced a subsequent commit, and is generally useful > to manually see what's going on with PCREv2 allocations while working > on that code. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > grep.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/grep.c b/grep.c > index f96d86c929..7d262a23d8 100644 > --- a/grep.c > +++ b/grep.c > @@ -42,15 +42,25 @@ static struct grep_opt grep_defaults = { > > #ifdef USE_LIBPCRE2 > static pcre2_general_context *pcre2_global_context; > +#define GREP_PCRE2_DEBUG_MALLOC 0 > > static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) > { > void *pointer = malloc(size); > +#if GREP_PCRE2_DEBUG_MALLOC > + static int count = 1; > + fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size); > +#endif > return pointer; > } > > static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) > { > +#if GREP_PCRE2_DEBUG_MALLOC > + static int count = 1; > + if (pointer) > + fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++); > +#endif > free(pointer); > } > #endif > -- > 2.30.0.284.gd98b1dd5eaa7 > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 06/10] grep/pcre2: use compile-time PCREv2 version test 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (6 preceding siblings ...) 2021-02-04 21:05 ` [PATCH 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason ` (3 subsequent siblings) 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Replace a use of pcre2_config(PCRE2_CONFIG_VERSION, ...) which I added in 95ca1f987ed (grep/pcre2: better support invalid UTF-8 haystacks, 2021-01-24) with the same test done at compile-time. It might be cuter to do this at runtime since we don't have to do the "major >= 11 || (major >= 10 && ...)" test. But in the next commit we'll add another version comparison that absolutely needs to be done at compile-time, so we're better of being consistent across the board. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 18 ++++-------------- grep.h | 3 +++ 2 files changed, 7 insertions(+), 14 deletions(-) diff --git a/grep.c b/grep.c index 7d262a23d8..e58044474d 100644 --- a/grep.c +++ b/grep.c @@ -400,21 +400,11 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt !(!opt->ignore_case && (p->fixed || p->is_fixed))) options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF); +#ifdef GIT_PCRE2_VERSION_10_36_OR_HIGHER /* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */ - if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS)) { - struct strbuf buf; - int len; - int err; - - if ((len = pcre2_config(PCRE2_CONFIG_VERSION, NULL)) < 0) - BUG("pcre2_config(..., NULL) failed: %d", len); - strbuf_init(&buf, len + 1); - if ((err = pcre2_config(PCRE2_CONFIG_VERSION, buf.buf)) < 0) - BUG("pcre2_config(..., buf.buf) failed: %d", err); - if (versioncmp(buf.buf, "10.36") < 0) - options |= PCRE2_NO_START_OPTIMIZE; - strbuf_release(&buf); - } + if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS)) + options |= PCRE2_NO_START_OPTIMIZE; +#endif p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern, p->patternlen, options, &error, &erroffset, diff --git a/grep.h b/grep.h index ae89d6254b..54e52042cb 100644 --- a/grep.h +++ b/grep.h @@ -4,6 +4,9 @@ #ifdef USE_LIBPCRE2 #define PCRE2_CODE_UNIT_WIDTH 8 #include <pcre2.h> +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 +#define GIT_PCRE2_VERSION_10_36_OR_HIGHER +#endif #else typedef int pcre2_code; typedef int pcre2_match_data; -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 07/10] grep/pcre2: use pcre2_maketables_free() function 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (7 preceding siblings ...) 2021-02-04 21:05 ` [PATCH 06/10] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-10 10:43 ` Johannes Schindelin 2021-03-04 0:16 ` Junio C Hamano 2021-02-04 21:05 ` [PATCH 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason ` (2 subsequent siblings) 11 siblings, 2 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Make use of the pcre2_maketables_free() function to free the memory allocated by pcre2_maketables(). At first sight it's strange that 10da030ab75 (grep: avoid leak of chartables in PCRE2, 2019-10-16) which added the free() call here doesn't make use of the pcre2_free() the author introduced in the preceding commit in 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16). The reason is that at the time the function didn't exist. It was first introduced in PCREv2 version 10.34, released on 2019-11-21. Let's make use of it behind a macro. I don't think this matters for anything to do with custom allocators, but it makes our use of PCREv2 more discoverable. At some distant point in the future we'll be able to drop the version guard, as nobody will be running a version older than 10.34. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 4 ++++ grep.h | 3 +++ 2 files changed, 7 insertions(+) diff --git a/grep.c b/grep.c index e58044474d..c63dbff4b2 100644 --- a/grep.c +++ b/grep.c @@ -490,7 +490,11 @@ static void free_pcre2_pattern(struct grep_pat *p) pcre2_compile_context_free(p->pcre2_compile_context); pcre2_code_free(p->pcre2_pattern); pcre2_match_data_free(p->pcre2_match_data); +#ifdef GIT_PCRE2_VERSION_10_34_OR_HIGHER + pcre2_maketables_free(pcre2_global_context, p->pcre2_tables); +#else free((void *)p->pcre2_tables); +#endif } #else /* !USE_LIBPCRE2 */ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) diff --git a/grep.h b/grep.h index 54e52042cb..64666e9204 100644 --- a/grep.h +++ b/grep.h @@ -7,6 +7,9 @@ #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 #define GIT_PCRE2_VERSION_10_36_OR_HIGHER #endif +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11 +#define GIT_PCRE2_VERSION_10_34_OR_HIGHER +#endif #else typedef int pcre2_code; typedef int pcre2_match_data; -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 07/10] grep/pcre2: use pcre2_maketables_free() function 2021-02-04 21:05 ` [PATCH 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason @ 2021-02-10 10:43 ` Johannes Schindelin 2021-03-04 0:16 ` Junio C Hamano 1 sibling, 0 replies; 42+ messages in thread From: Johannes Schindelin @ 2021-02-10 10:43 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Jeff King, Carlo Marcelo Arenas Belón [-- Attachment #1: Type: text/plain, Size: 2286 bytes --] Hi Ævar, On Thu, 4 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > Make use of the pcre2_maketables_free() function to free the memory > allocated by pcre2_maketables(). At first sight it's strange that > 10da030ab75 (grep: avoid leak of chartables in PCRE2, 2019-10-16) > which added the free() call here doesn't make use of the pcre2_free() > the author introduced in the preceding commit in 513f2b0bbd4 (grep: > make PCRE2 aware of custom allocator, 2019-10-16). > > The reason is that at the time the function didn't exist. It was first > introduced in PCREv2 version 10.34, released on 2019-11-21. Git for Windows still uses v10.33. Thanks for the prod, I will update the library. Ciao, Dscho > > Let's make use of it behind a macro. I don't think this matters for > anything to do with custom allocators, but it makes our use of PCREv2 > more discoverable. At some distant point in the future we'll be able > to drop the version guard, as nobody will be running a version older > than 10.34. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > grep.c | 4 ++++ > grep.h | 3 +++ > 2 files changed, 7 insertions(+) > > diff --git a/grep.c b/grep.c > index e58044474d..c63dbff4b2 100644 > --- a/grep.c > +++ b/grep.c > @@ -490,7 +490,11 @@ static void free_pcre2_pattern(struct grep_pat *p) > pcre2_compile_context_free(p->pcre2_compile_context); > pcre2_code_free(p->pcre2_pattern); > pcre2_match_data_free(p->pcre2_match_data); > +#ifdef GIT_PCRE2_VERSION_10_34_OR_HIGHER > + pcre2_maketables_free(pcre2_global_context, p->pcre2_tables); > +#else > free((void *)p->pcre2_tables); > +#endif > } > #else /* !USE_LIBPCRE2 */ > static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) > diff --git a/grep.h b/grep.h > index 54e52042cb..64666e9204 100644 > --- a/grep.h > +++ b/grep.h > @@ -7,6 +7,9 @@ > #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 > #define GIT_PCRE2_VERSION_10_36_OR_HIGHER > #endif > +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11 > +#define GIT_PCRE2_VERSION_10_34_OR_HIGHER > +#endif > #else > typedef int pcre2_code; > typedef int pcre2_match_data; > -- > 2.30.0.284.gd98b1dd5eaa7 > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 07/10] grep/pcre2: use pcre2_maketables_free() function 2021-02-04 21:05 ` [PATCH 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason 2021-02-10 10:43 ` Johannes Schindelin @ 2021-03-04 0:16 ` Junio C Hamano 1 sibling, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2021-03-04 0:16 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Make use of the pcre2_maketables_free() function to free the memory > allocated by pcre2_maketables(). At first sight it's strange that > 10da030ab75 (grep: avoid leak of chartables in PCRE2, 2019-10-16) > which added the free() call here doesn't make use of the pcre2_free() > the author introduced in the preceding commit in 513f2b0bbd4 (grep: > make PCRE2 aware of custom allocator, 2019-10-16). > > The reason is that at the time the function didn't exist. It was first > introduced in PCREv2 version 10.34, released on 2019-11-21. > > Let's make use of it behind a macro. I don't think this matters for > anything to do with custom allocators, but it makes our use of PCREv2 > more discoverable. At some distant point in the future we'll be able > to drop the version guard, as nobody will be running a version older > than 10.34. OK. The same comment about the macro that happens to be OK only because 10 and 11 are consecutive applies here. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > grep.c | 4 ++++ > grep.h | 3 +++ > 2 files changed, 7 insertions(+) > > diff --git a/grep.c b/grep.c > index e58044474d..c63dbff4b2 100644 > --- a/grep.c > +++ b/grep.c > @@ -490,7 +490,11 @@ static void free_pcre2_pattern(struct grep_pat *p) > pcre2_compile_context_free(p->pcre2_compile_context); > pcre2_code_free(p->pcre2_pattern); > pcre2_match_data_free(p->pcre2_match_data); > +#ifdef GIT_PCRE2_VERSION_10_34_OR_HIGHER > + pcre2_maketables_free(pcre2_global_context, p->pcre2_tables); > +#else > free((void *)p->pcre2_tables); > +#endif > } > #else /* !USE_LIBPCRE2 */ > static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) > diff --git a/grep.h b/grep.h > index 54e52042cb..64666e9204 100644 > --- a/grep.h > +++ b/grep.h > @@ -7,6 +7,9 @@ > #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 > #define GIT_PCRE2_VERSION_10_36_OR_HIGHER > #endif > +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11 > +#define GIT_PCRE2_VERSION_10_34_OR_HIGHER > +#endif > #else > typedef int pcre2_code; > typedef int pcre2_match_data; ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 08/10] grep/pcre2: actually make pcre2 use custom allocator 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (8 preceding siblings ...) 2021-02-04 21:05 ` [PATCH 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-10 12:38 ` Johannes Schindelin 2021-02-04 21:05 ` [PATCH 09/10] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason 11 siblings, 1 reply; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Continue work started in 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16) and make PCREv2 use our pcre2_{malloc,free}(). functions for allocation. We'll now use it for all PCREv2 allocations. The reason 513f2b0bbd4 worked as a bugfix for the USE_NED_ALLOCATOR issue is because it managed to target pretty much the allocation freed via free(), as opposed to by a pcre2_*free() function. I.e. the pcre2_maketables() and pcre2_maketables_free() pair. For most of the rest we continued allocating with stock malloc() inside PCREv2 itself, but didn't segfault because we'd use its corresponding free(). In a preceding commit of mine I changed the free() to pcre2_maketables_free() on versions of PCREv2 10.34 and newer. So as far as fixing the segfault goes we could revert 513f2b0bbd4. But then we wouldn't use the desired allocator, let's just use it instead. Before this patch we'd on e.g.: grep --threads=1 -iP æ.*var.*xyz Only use pcre2_{malloc,free}() for 2 malloc() calls and 2 corresponding free() call. Now it's 12 calls to each. This can be observed with the GREP_PCRE2_DEBUG_MALLOC debug mode. Reading the history of how this bug got introduced it wasn't present in Johannes's original patch[1] to fix the issue. My reading of that thread is that the approach the follow-up patches to Johannes's original pursued were based on misunderstanding of how the PCREv2 API works. In particular this part of [2]: "most of the time (like when using UTF-8) the chartable (and therefore the global context) is not needed (even when using alternate allocators)" That's simply not how PCREv2 memory allocation works. It's easy to see how the misunderstanding came about. It's because (as noted above) the issue was noticed because of our use of free() in our own grep.c for freeing the memory allocated by pcre2_maketables(). Thus the misunderstanding that PCREv2's compile context is something only needed for pcre2_maketables(), and e.g. an aborted earlier attempt[3] to only set it up when we ourselves called pcre2_maketables(). That's not what PCREv2's compile context is. To quote PCREv2's documentation: "This context just contains pointers to (and data for) external memory management functions that are called from several places in the PCRE2 library." Thus the failed attempts to go down the route of only creating the general context in cases where we ourselves call pcre2_maketables(), before finally settling on the approach 513f2b0bbd4 took of always creating it. Instead we should always create it, and then pass the general context to those functions that accept it, so that they'll consistently use our preferred memory allocation functions. 1. https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ 2. https://lore.kernel.org/git/CAPUEsphMh_ZqcH3M7PXC9jHTfEdQN3mhTAK2JDkdvKBp53YBoA@mail.gmail.com/ 3. https://lore.kernel.org/git/20190806085014.47776-3-carenas@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/grep.c b/grep.c index c63dbff4b2..0116ff5f09 100644 --- a/grep.c +++ b/grep.c @@ -390,7 +390,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt if (!pcre2_global_context) BUG("pcre2_global_context uninitialized"); p->pcre2_tables = pcre2_maketables(pcre2_global_context); - p->pcre2_compile_context = pcre2_compile_context_create(NULL); + p->pcre2_compile_context = pcre2_compile_context_create(pcre2_global_context); pcre2_set_character_tables(p->pcre2_compile_context, p->pcre2_tables); } @@ -411,7 +411,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_compile_context); if (p->pcre2_pattern) { - p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL); + p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, pcre2_global_context); if (!p->pcre2_match_data) die("Couldn't allocate PCRE2 match data"); } else { -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 08/10] grep/pcre2: actually make pcre2 use custom allocator 2021-02-04 21:05 ` [PATCH 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason @ 2021-02-10 12:38 ` Johannes Schindelin 0 siblings, 0 replies; 42+ messages in thread From: Johannes Schindelin @ 2021-02-10 12:38 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Jeff King, Carlo Marcelo Arenas Belón [-- Attachment #1: Type: text/plain, Size: 4559 bytes --] Hi Ævar, ACK! And thank you for this patch, Dscho On Thu, 4 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > Continue work started in 513f2b0bbd4 (grep: make PCRE2 aware of custom > allocator, 2019-10-16) and make PCREv2 use our pcre2_{malloc,free}(). > functions for allocation. We'll now use it for all PCREv2 allocations. > > The reason 513f2b0bbd4 worked as a bugfix for the USE_NED_ALLOCATOR > issue is because it managed to target pretty much the allocation freed > via free(), as opposed to by a pcre2_*free() function. I.e. the > pcre2_maketables() and pcre2_maketables_free() pair. For most of the > rest we continued allocating with stock malloc() inside PCREv2 itself, > but didn't segfault because we'd use its corresponding free(). > > In a preceding commit of mine I changed the free() to > pcre2_maketables_free() on versions of PCREv2 10.34 and newer. So as > far as fixing the segfault goes we could revert 513f2b0bbd4. But then > we wouldn't use the desired allocator, let's just use it instead. > > Before this patch we'd on e.g.: > > grep --threads=1 -iP æ.*var.*xyz > > Only use pcre2_{malloc,free}() for 2 malloc() calls and 2 > corresponding free() call. Now it's 12 calls to each. This can be > observed with the GREP_PCRE2_DEBUG_MALLOC debug mode. > > Reading the history of how this bug got introduced it wasn't present > in Johannes's original patch[1] to fix the issue. > > My reading of that thread is that the approach the follow-up patches > to Johannes's original pursued were based on misunderstanding of how > the PCREv2 API works. In particular this part of [2]: > > "most of the time (like when using UTF-8) the chartable (and > therefore the global context) is not needed (even when using > alternate allocators)" > > That's simply not how PCREv2 memory allocation works. It's easy to see > how the misunderstanding came about. It's because (as noted above) the > issue was noticed because of our use of free() in our own grep.c for > freeing the memory allocated by pcre2_maketables(). > > Thus the misunderstanding that PCREv2's compile context is something > only needed for pcre2_maketables(), and e.g. an aborted earlier > attempt[3] to only set it up when we ourselves called > pcre2_maketables(). > > That's not what PCREv2's compile context is. To quote PCREv2's > documentation: > > "This context just contains pointers to (and data for) external > memory management functions that are called from several places in > the PCRE2 library." > > Thus the failed attempts to go down the route of only creating the > general context in cases where we ourselves call pcre2_maketables(), > before finally settling on the approach 513f2b0bbd4 took of always > creating it. > > Instead we should always create it, and then pass the general context > to those functions that accept it, so that they'll consistently use > our preferred memory allocation functions. > > 1. https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ > 2. https://lore.kernel.org/git/CAPUEsphMh_ZqcH3M7PXC9jHTfEdQN3mhTAK2JDkdvKBp53YBoA@mail.gmail.com/ > 3. https://lore.kernel.org/git/20190806085014.47776-3-carenas@gmail.com/ > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > grep.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/grep.c b/grep.c > index c63dbff4b2..0116ff5f09 100644 > --- a/grep.c > +++ b/grep.c > @@ -390,7 +390,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt > if (!pcre2_global_context) > BUG("pcre2_global_context uninitialized"); > p->pcre2_tables = pcre2_maketables(pcre2_global_context); > - p->pcre2_compile_context = pcre2_compile_context_create(NULL); > + p->pcre2_compile_context = pcre2_compile_context_create(pcre2_global_context); > pcre2_set_character_tables(p->pcre2_compile_context, > p->pcre2_tables); > } > @@ -411,7 +411,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt > p->pcre2_compile_context); > > if (p->pcre2_pattern) { > - p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL); > + p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, pcre2_global_context); > if (!p->pcre2_match_data) > die("Couldn't allocate PCRE2 match data"); > } else { > -- > 2.30.0.284.gd98b1dd5eaa7 > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 09/10] grep/pcre2: move back to thread-only PCREv2 structures 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (9 preceding siblings ...) 2021-02-04 21:05 ` [PATCH 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason 11 siblings, 0 replies; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Change the setup of the "pcre2_general_context" to happen per-thread in compile_pcre2_pattern() instead of in grep_init(), as happens with all the rest of the pcre2_* members of the grep_pat structure. As noted in the preceding commit the approach 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16) took to allocate the pcre2_general_context seems to have been initially based on a misunderstanding of how PCREv2 memory allocation works. This approach of creating a global context is just added complexity for almost zero gain. On my system it's 24 bytes saved per-thread, for context PCREv2 will then go on to some kilobytes for its own thread-local state. As noted in 6d423dd542f (grep: don't redundantly compile throwaway patterns under threading, 2017-05-25) the grep code is intentionally not trying to micro-optimize allocations by e.g. sharing some PCREv2 structures globally, while making others thread-local. So let's remove this special case and make all of them thread-local for simplicity again. See also the discussion in 94da9193a6 (grep: add support for PCRE v2, 2017-06-01) about thread safety, and Johannes's comments[1] to the effect that we should be doing what this patch is doing. 1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.1908052120302.46@tvgsbejvaqbjf.bet/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/grep.c | 1 - grep.c | 41 +++++++++++++++-------------------------- grep.h | 3 ++- 3 files changed, 17 insertions(+), 28 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 55d06c9513..c69fe99340 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1175,6 +1175,5 @@ int cmd_grep(int argc, const char **argv, const char *prefix) run_pager(&opt, prefix); clear_pathspec(&pathspec); free_grep_patterns(&opt); - grep_destroy(); return !hit; } diff --git a/grep.c b/grep.c index 0116ff5f09..2599f329cd 100644 --- a/grep.c +++ b/grep.c @@ -41,7 +41,6 @@ static struct grep_opt grep_defaults = { }; #ifdef USE_LIBPCRE2 -static pcre2_general_context *pcre2_global_context; #define GREP_PCRE2_DEBUG_MALLOC 0 static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) @@ -163,20 +162,9 @@ int grep_config(const char *var, const char *value, void *cb) * Initialize one instance of grep_opt and copy the * default values from the template we read the configuration * information in an earlier call to git_config(grep_config). - * - * If using PCRE, make sure that the library is configured - * to use the same allocator as Git (e.g. nedmalloc on Windows). - * - * Any allocated memory needs to be released in grep_destroy(). */ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix) { -#if defined(USE_LIBPCRE2) - if (!pcre2_global_context) - pcre2_global_context = pcre2_general_context_create( - pcre2_malloc, pcre2_free, NULL); -#endif - *opt = grep_defaults; opt->repo = repo; @@ -186,13 +174,6 @@ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix opt->header_tail = &opt->header_list; } -void grep_destroy(void) -{ -#ifdef USE_LIBPCRE2 - pcre2_general_context_free(pcre2_global_context); -#endif -} - static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt) { /* @@ -384,13 +365,20 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt int patinforet; size_t jitsizearg; - /* pcre2_global_context is initialized in grep_init */ + /* + * Call pcre2_general_context_create() before calling any + * other pcre2_*(). It sets up our malloc()/free() functions + * with which everything else is allocated. + */ + p->pcre2_general_context = pcre2_general_context_create( + pcre2_malloc, pcre2_free, NULL); + if (!p->pcre2_general_context) + die("Couldn't allocate PCRE2 general context"); + if (opt->ignore_case) { if (!opt->ignore_locale && has_non_ascii(p->pattern)) { - if (!pcre2_global_context) - BUG("pcre2_global_context uninitialized"); - p->pcre2_tables = pcre2_maketables(pcre2_global_context); - p->pcre2_compile_context = pcre2_compile_context_create(pcre2_global_context); + p->pcre2_tables = pcre2_maketables(p->pcre2_general_context); + p->pcre2_compile_context = pcre2_compile_context_create(p->pcre2_general_context); pcre2_set_character_tables(p->pcre2_compile_context, p->pcre2_tables); } @@ -411,7 +399,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_compile_context); if (p->pcre2_pattern) { - p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, pcre2_global_context); + p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, p->pcre2_general_context); if (!p->pcre2_match_data) die("Couldn't allocate PCRE2 match data"); } else { @@ -491,10 +479,11 @@ static void free_pcre2_pattern(struct grep_pat *p) pcre2_code_free(p->pcre2_pattern); pcre2_match_data_free(p->pcre2_match_data); #ifdef GIT_PCRE2_VERSION_10_34_OR_HIGHER - pcre2_maketables_free(pcre2_global_context, p->pcre2_tables); + pcre2_maketables_free(p->pcre2_general_context, p->pcre2_tables); #else free((void *)p->pcre2_tables); #endif + pcre2_general_context_free(p->pcre2_general_context); } #else /* !USE_LIBPCRE2 */ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) diff --git a/grep.h b/grep.h index 64666e9204..72f82b1e30 100644 --- a/grep.h +++ b/grep.h @@ -14,6 +14,7 @@ typedef int pcre2_code; typedef int pcre2_match_data; typedef int pcre2_compile_context; +typedef int pcre2_general_context; #endif #ifndef PCRE2_MATCH_INVALID_UTF /* PCRE2_MATCH_* dummy also with !USE_LIBPCRE2, for test-pcre2-config.c */ @@ -75,6 +76,7 @@ struct grep_pat { pcre2_code *pcre2_pattern; pcre2_match_data *pcre2_match_data; pcre2_compile_context *pcre2_compile_context; + pcre2_general_context *pcre2_general_context; const uint8_t *pcre2_tables; uint32_t pcre2_jit_on; unsigned fixed:1; @@ -167,7 +169,6 @@ struct grep_opt { int grep_config(const char *var, const char *value, void *); void grep_init(struct grep_opt *, struct repository *repo, const char *prefix); -void grep_destroy(void); void grep_commit_pattern_type(enum grep_pattern_type, struct grep_opt *opt); void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t); -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget ` (10 preceding siblings ...) 2021-02-04 21:05 ` [PATCH 09/10] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 ` Ævar Arnfjörð Bjarmason 2021-02-10 12:40 ` Johannes Schindelin 11 siblings, 1 reply; 42+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:05 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón, Ævar Arnfjörð Bjarmason Move the definitions of the pcre2_{malloc,free} functions above the compile_pcre2_pattern() function they're used it. Before the preceding commit they used to be needed earlier, but now we can move them to be adjacent to the other PCREv2 functions. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- grep.c | 46 ++++++++++++++++++++++------------------------ 1 file changed, 22 insertions(+), 24 deletions(-) diff --git a/grep.c b/grep.c index 2599f329cd..636ac48bf0 100644 --- a/grep.c +++ b/grep.c @@ -40,30 +40,6 @@ static struct grep_opt grep_defaults = { .output = std_output, }; -#ifdef USE_LIBPCRE2 -#define GREP_PCRE2_DEBUG_MALLOC 0 - -static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) -{ - void *pointer = malloc(size); -#if GREP_PCRE2_DEBUG_MALLOC - static int count = 1; - fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size); -#endif - return pointer; -} - -static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) -{ -#if GREP_PCRE2_DEBUG_MALLOC - static int count = 1; - if (pointer) - fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++); -#endif - free(pointer); -} -#endif - static const char *color_grep_slots[] = { [GREP_COLOR_CONTEXT] = "context", [GREP_COLOR_FILENAME] = "filename", @@ -355,6 +331,28 @@ static int is_fixed(const char *s, size_t len) } #ifdef USE_LIBPCRE2 +#define GREP_PCRE2_DEBUG_MALLOC 0 + +static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) +{ + void *pointer = malloc(size); +#if GREP_PCRE2_DEBUG_MALLOC + static int count = 1; + fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size); +#endif + return pointer; +} + +static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) +{ +#if GREP_PCRE2_DEBUG_MALLOC + static int count = 1; + if (pointer) + fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++); +#endif + free(pointer); +} + static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) { int error; -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} 2021-02-04 21:05 ` [PATCH 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason @ 2021-02-10 12:40 ` Johannes Schindelin 0 siblings, 0 replies; 42+ messages in thread From: Johannes Schindelin @ 2021-02-10 12:40 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Jeff King, Carlo Marcelo Arenas Belón [-- Attachment #1: Type: text/plain, Size: 2497 bytes --] Hi Ævar, On Thu, 4 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > Move the definitions of the pcre2_{malloc,free} functions above the > compile_pcre2_pattern() function they're used it. Before the preceding s/it/in/ Thank you for this entire patch series. I like its intention and its execution. Ciao, Dscho > commit they used to be needed earlier, but now we can move them to be > adjacent to the other PCREv2 functions. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > grep.c | 46 ++++++++++++++++++++++------------------------ > 1 file changed, 22 insertions(+), 24 deletions(-) > > diff --git a/grep.c b/grep.c > index 2599f329cd..636ac48bf0 100644 > --- a/grep.c > +++ b/grep.c > @@ -40,30 +40,6 @@ static struct grep_opt grep_defaults = { > .output = std_output, > }; > > -#ifdef USE_LIBPCRE2 > -#define GREP_PCRE2_DEBUG_MALLOC 0 > - > -static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) > -{ > - void *pointer = malloc(size); > -#if GREP_PCRE2_DEBUG_MALLOC > - static int count = 1; > - fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size); > -#endif > - return pointer; > -} > - > -static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) > -{ > -#if GREP_PCRE2_DEBUG_MALLOC > - static int count = 1; > - if (pointer) > - fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++); > -#endif > - free(pointer); > -} > -#endif > - > static const char *color_grep_slots[] = { > [GREP_COLOR_CONTEXT] = "context", > [GREP_COLOR_FILENAME] = "filename", > @@ -355,6 +331,28 @@ static int is_fixed(const char *s, size_t len) > } > > #ifdef USE_LIBPCRE2 > +#define GREP_PCRE2_DEBUG_MALLOC 0 > + > +static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data) > +{ > + void *pointer = malloc(size); > +#if GREP_PCRE2_DEBUG_MALLOC > + static int count = 1; > + fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size); > +#endif > + return pointer; > +} > + > +static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) > +{ > +#if GREP_PCRE2_DEBUG_MALLOC > + static int count = 1; > + if (pointer) > + fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++); > +#endif > + free(pointer); > +} > + > static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) > { > int error; > -- > 2.30.0.284.gd98b1dd5eaa7 > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 3/3] grep: avoid leak of chartables in PCRE2 2019-10-16 12:10 ` [PATCH v2 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-16 12:10 ` Carlo Marcelo Arenas Belón via GitGitGadget 2 siblings, 0 replies; 42+ messages in thread From: Carlo Marcelo Arenas Belón via GitGitGadget @ 2019-10-16 12:10 UTC (permalink / raw) To: git Cc: Carlo Marcelo Arenas Belón, Johannes Schindelin, Junio C Hamano, Carlo Marcelo Arenas Belón From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas@gmail.com> 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) introduced a small memory leak visible with valgrind in t7813. Complete the creation of a PCRE2 specific variable that was missing from the original change and free the generated table just like it is done for PCRE1. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- grep.c | 7 ++++--- grep.h | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/grep.c b/grep.c index 296edbb56f..9556d13dc1 100644 --- a/grep.c +++ b/grep.c @@ -525,7 +525,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt PCRE2_UCHAR errbuf[256]; PCRE2_SIZE erroffset; int options = PCRE2_MULTILINE; - const uint8_t *character_tables = NULL; int jitret; int patinforet; size_t jitsizearg; @@ -539,9 +538,10 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt if (has_non_ascii(p->pattern)) { if (!pcre2_global_context) BUG("pcre2_global_context uninitialized"); - character_tables = pcre2_maketables(pcre2_global_context); + p->pcre2_tables = pcre2_maketables(pcre2_global_context); p->pcre2_compile_context = pcre2_compile_context_create(NULL); - pcre2_set_character_tables(p->pcre2_compile_context, character_tables); + pcre2_set_character_tables(p->pcre2_compile_context, + p->pcre2_tables); } options |= PCRE2_CASELESS; } @@ -645,6 +645,7 @@ static void free_pcre2_pattern(struct grep_pat *p) pcre2_match_data_free(p->pcre2_match_data); pcre2_jit_stack_free(p->pcre2_jit_stack); pcre2_match_context_free(p->pcre2_match_context); + free((void *)p->pcre2_tables); } #else /* !USE_LIBPCRE2 */ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) diff --git a/grep.h b/grep.h index 526c2db9ef..533165c21a 100644 --- a/grep.h +++ b/grep.h @@ -96,6 +96,7 @@ struct grep_pat { pcre2_compile_context *pcre2_compile_context; pcre2_match_context *pcre2_match_context; pcre2_jit_stack *pcre2_jit_stack; + const uint8_t *pcre2_tables; uint32_t pcre2_jit_on; kwset_t kws; unsigned fixed:1; -- gitgitgadget ^ permalink raw reply related [flat|nested] 42+ messages in thread
end of thread, other threads:[~2021-03-04 1:11 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-10-16 12:06 [PATCH 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget 2019-10-16 12:06 ` [PATCH 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-16 12:06 ` [PATCH 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-16 12:06 ` [PATCH 3/3] grep: avoid leak of chartables in PCRE2 Johannes Schindelin via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget 2019-10-18 1:38 ` Junio C Hamano 2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason 2021-02-10 21:34 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 " Ævar Arnfjörð Bjarmason 2021-03-04 0:34 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 02/10] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 03/10] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 06/10] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason 2021-03-04 0:14 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason 2021-02-18 0:07 ` [PATCH v2 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason 2021-03-04 0:24 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 09/10] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason 2021-03-04 0:27 ` Junio C Hamano 2021-02-18 0:07 ` [PATCH v2 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason 2021-03-04 0:28 ` Junio C Hamano 2021-02-04 21:05 ` [PATCH 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 02/10] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 03/10] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason 2021-02-10 10:38 ` Johannes Schindelin 2021-02-04 21:05 ` [PATCH 06/10] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason 2021-02-10 10:43 ` Johannes Schindelin 2021-03-04 0:16 ` Junio C Hamano 2021-02-04 21:05 ` [PATCH 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason 2021-02-10 12:38 ` Johannes Schindelin 2021-02-04 21:05 ` [PATCH 09/10] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason 2021-02-04 21:05 ` [PATCH 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason 2021-02-10 12:40 ` Johannes Schindelin 2019-10-16 12:10 ` [PATCH v2 3/3] grep: avoid leak of chartables in PCRE2 Carlo Marcelo Arenas Belón via GitGitGadget
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).