git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] grep: memory leak in PCRE2
@ 2019-07-27 20:27 Carlo Marcelo Arenas Belón
  2019-07-27 20:27 ` [PATCH 1/3] grep: make pcre1_tables version agnostic Carlo Marcelo Arenas Belón
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2019-07-27 20:27 UTC (permalink / raw)
  To: git; +Cc: avarab

This series fixes a small memory leak that was introduced when PCRE2
support was added, and that should be visible with t7813 and valgrind

Applies cleanly to maint and master but will conflict slightly with
ab/no-kwset (currently in next) and most likely other changes introduced
about this same codebase (ex: ab/pcre-jit-fixes) but hopefully the
spreading on short simple commits helps with reviewing.

Carlo Marcelo Arenas Belón (3):
  grep: make pcre1_tables version agnostic
  grep: use pcre_tables also for PCRE2
  grep: plug leak of pcre chartables in PCRE2

 grep.c | 12 ++++++------
 grep.h |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

-- 
2.22.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] grep: make pcre1_tables version agnostic
  2019-07-27 20:27 [PATCH 0/3] grep: memory leak in PCRE2 Carlo Marcelo Arenas Belón
@ 2019-07-27 20:27 ` Carlo Marcelo Arenas Belón
  2019-07-27 23:47   ` Ævar Arnfjörð Bjarmason
  2019-07-27 20:27 ` [PATCH 2/3] grep: use pcre_tables also for PCRE2 Carlo Marcelo Arenas Belón
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2019-07-27 20:27 UTC (permalink / raw)
  To: git; +Cc: avarab

6d4b5747f0 ("grep: change internal *pcre* variable & function names
to be *pcre1*", 2017-05-25), renamed most variables to be PCRE1
specific to give space to similarly named variables for PCRE2, but
in this case the change wasn't needed as the types were compatible
enough (const unsigned char* vs const uint8_t*) to be shared.

Revert that change, as 94da9193a6 ("grep: add support for PCRE v2",
2017-06-01) failed to create an equivalent PCRE2 version.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 grep.c | 6 +++---
 grep.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index f7c3a5803e..cc65f7a987 100644
--- a/grep.c
+++ b/grep.c
@@ -389,14 +389,14 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 
 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern))
-			p->pcre1_tables = pcre_maketables();
+			p->pcre_tables = pcre_maketables();
 		options |= PCRE_CASELESS;
 	}
 	if (is_utf8_locale() && has_non_ascii(p->pattern))
 		options |= PCRE_UTF8;
 
 	p->pcre1_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
-				      p->pcre1_tables);
+				      p->pcre_tables);
 	if (!p->pcre1_regexp)
 		compile_regexp_failed(p, error);
 
@@ -462,7 +462,7 @@ static void free_pcre1_regexp(struct grep_pat *p)
 	{
 		pcre_free(p->pcre1_extra_info);
 	}
-	pcre_free((void *)p->pcre1_tables);
+	pcre_free((void *)p->pcre_tables);
 }
 #else /* !USE_LIBPCRE1 */
 static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
diff --git a/grep.h b/grep.h
index 1875880f37..d34f66b384 100644
--- a/grep.h
+++ b/grep.h
@@ -89,7 +89,7 @@ struct grep_pat {
 	pcre *pcre1_regexp;
 	pcre_extra *pcre1_extra_info;
 	pcre_jit_stack *pcre1_jit_stack;
-	const unsigned char *pcre1_tables;
+	const unsigned char *pcre_tables;
 	int pcre1_jit_on;
 	pcre2_code *pcre2_pattern;
 	pcre2_match_data *pcre2_match_data;
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] grep: use pcre_tables also for PCRE2
  2019-07-27 20:27 [PATCH 0/3] grep: memory leak in PCRE2 Carlo Marcelo Arenas Belón
  2019-07-27 20:27 ` [PATCH 1/3] grep: make pcre1_tables version agnostic Carlo Marcelo Arenas Belón
@ 2019-07-27 20:27 ` Carlo Marcelo Arenas Belón
  2019-07-27 20:27 ` [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2 Carlo Marcelo Arenas Belón
  2019-08-01 17:09 ` [PATCH v2] grep: avoid leak of " Carlo Marcelo Arenas Belón
  3 siblings, 0 replies; 15+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2019-07-27 20:27 UTC (permalink / raw)
  To: git; +Cc: avarab

94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) added a
local variable to keep track of the chartables (used when locale
is not UTF-8 but non-ASCII characters are needed)

Remove that local variable in favor of the shared one within the
grep structure and that can be shared by all functions.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 grep.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/grep.c b/grep.c
index cc65f7a987..d04635fad4 100644
--- a/grep.c
+++ b/grep.c
@@ -488,7 +488,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	PCRE2_UCHAR errbuf[256];
 	PCRE2_SIZE erroffset;
 	int options = PCRE2_MULTILINE;
-	const uint8_t *character_tables = NULL;
 	int jitret;
 	int patinforet;
 	size_t jitsizearg;
@@ -499,9 +498,9 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 
 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern)) {
-			character_tables = pcre2_maketables(NULL);
+			p->pcre_tables = pcre2_maketables(NULL);
 			p->pcre2_compile_context = pcre2_compile_context_create(NULL);
-			pcre2_set_character_tables(p->pcre2_compile_context, character_tables);
+			pcre2_set_character_tables(p->pcre2_compile_context, p->pcre_tables);
 		}
 		options |= PCRE2_CASELESS;
 	}
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2
  2019-07-27 20:27 [PATCH 0/3] grep: memory leak in PCRE2 Carlo Marcelo Arenas Belón
  2019-07-27 20:27 ` [PATCH 1/3] grep: make pcre1_tables version agnostic Carlo Marcelo Arenas Belón
  2019-07-27 20:27 ` [PATCH 2/3] grep: use pcre_tables also for PCRE2 Carlo Marcelo Arenas Belón
@ 2019-07-27 20:27 ` Carlo Marcelo Arenas Belón
  2019-07-27 23:48   ` Ævar Arnfjörð Bjarmason
  2019-08-01 17:09 ` [PATCH v2] grep: avoid leak of " Carlo Marcelo Arenas Belón
  3 siblings, 1 reply; 15+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2019-07-27 20:27 UTC (permalink / raw)
  To: git; +Cc: avarab

Just as it is done with PCRE1, make sure that the allocated chartables
get free at cleanup time.

This assumes no global context is used (NULL passed when created the
tables), but will likely be updated in tandem if that ever changes.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 grep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/grep.c b/grep.c
index d04635fad4..d9768c5f05 100644
--- a/grep.c
+++ b/grep.c
@@ -604,6 +604,7 @@ static void free_pcre2_pattern(struct grep_pat *p)
 	pcre2_match_data_free(p->pcre2_match_data);
 	pcre2_jit_stack_free(p->pcre2_jit_stack);
 	pcre2_match_context_free(p->pcre2_match_context);
+	free((void *)p->pcre_tables);
 }
 #else /* !USE_LIBPCRE2 */
 static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] grep: make pcre1_tables version agnostic
  2019-07-27 20:27 ` [PATCH 1/3] grep: make pcre1_tables version agnostic Carlo Marcelo Arenas Belón
@ 2019-07-27 23:47   ` Ævar Arnfjörð Bjarmason
  2019-07-28  2:50     ` Carlo Arenas
  0 siblings, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-07-27 23:47 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón; +Cc: git


On Sat, Jul 27 2019, Carlo Marcelo Arenas Belón wrote:

> 6d4b5747f0 ("grep: change internal *pcre* variable & function names
> to be *pcre1*", 2017-05-25), renamed most variables to be PCRE1
> specific to give space to similarly named variables for PCRE2, but
> in this case the change wasn't needed as the types were compatible
> enough (const unsigned char* vs const uint8_t*) to be shared.

Both the v1 and v2 functions return const unsigned char *. I don't know
where I got the uint8_t from. This makes more sense.

This series looks good to me. Thanks for the fix. Just one caveat:

The point of 6d4b5747f0 was not to only split out those variables we
couldn't get away with re-using. Then I would have later re-used
e.g. pcre1_jit_on & pcre2_jit_on as just pcre_jit_on. We could also do
that now.

I think doing that & this part of the your changes makes things less
readable. The two code branches we compile with ifdefs are mutually
exclusive, so having the variables be unique helps with eyeballing /
reasoning when changing the code.

> Revert that change, as 94da9193a6 ("grep: add support for PCRE v2",
> 2017-06-01) failed to create an equivalent PCRE2 version.
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  grep.c | 6 +++---
>  grep.h | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index f7c3a5803e..cc65f7a987 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -389,14 +389,14 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
>
>  	if (opt->ignore_case) {
>  		if (has_non_ascii(p->pattern))
> -			p->pcre1_tables = pcre_maketables();
> +			p->pcre_tables = pcre_maketables();
>  		options |= PCRE_CASELESS;
>  	}
>  	if (is_utf8_locale() && has_non_ascii(p->pattern))
>  		options |= PCRE_UTF8;
>
>  	p->pcre1_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
> -				      p->pcre1_tables);
> +				      p->pcre_tables);
>  	if (!p->pcre1_regexp)
>  		compile_regexp_failed(p, error);
>
> @@ -462,7 +462,7 @@ static void free_pcre1_regexp(struct grep_pat *p)
>  	{
>  		pcre_free(p->pcre1_extra_info);
>  	}
> -	pcre_free((void *)p->pcre1_tables);
> +	pcre_free((void *)p->pcre_tables);
>  }
>  #else /* !USE_LIBPCRE1 */
>  static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
> diff --git a/grep.h b/grep.h
> index 1875880f37..d34f66b384 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -89,7 +89,7 @@ struct grep_pat {
>  	pcre *pcre1_regexp;
>  	pcre_extra *pcre1_extra_info;
>  	pcre_jit_stack *pcre1_jit_stack;
> -	const unsigned char *pcre1_tables;
> +	const unsigned char *pcre_tables;
>  	int pcre1_jit_on;
>  	pcre2_code *pcre2_pattern;
>  	pcre2_match_data *pcre2_match_data;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2
  2019-07-27 20:27 ` [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2 Carlo Marcelo Arenas Belón
@ 2019-07-27 23:48   ` Ævar Arnfjörð Bjarmason
  2019-07-28  1:41     ` Carlo Arenas
  0 siblings, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-07-27 23:48 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón; +Cc: git


On Sat, Jul 27 2019, Carlo Marcelo Arenas Belón wrote:

> Just as it is done with PCRE1, make sure that the allocated chartables
> get free at cleanup time.
>
> This assumes no global context is used (NULL passed when created the
> tables), but will likely be updated in tandem if that ever changes.
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  grep.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/grep.c b/grep.c
> index d04635fad4..d9768c5f05 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -604,6 +604,7 @@ static void free_pcre2_pattern(struct grep_pat *p)
>  	pcre2_match_data_free(p->pcre2_match_data);
>  	pcre2_jit_stack_free(p->pcre2_jit_stack);
>  	pcre2_match_context_free(p->pcre2_match_context);
> +	free((void *)p->pcre_tables);

Is the cast really needed? I'm rusty on the rules, removing it from the
pcre_free() you might have copied this from produces a warning for me,
but not for free() itself. This is on GCC 8.3.0. How about for you &
what compiler(s)?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2
  2019-07-27 23:48   ` Ævar Arnfjörð Bjarmason
@ 2019-07-28  1:41     ` Carlo Arenas
  2019-07-29 20:34       ` René Scharfe
  0 siblings, 1 reply; 15+ messages in thread
From: Carlo Arenas @ 2019-07-28  1:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Sat, Jul 27, 2019 at 4:48 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> > +     free((void *)p->pcre_tables);
>
> Is the cast really needed? I'm rusty on the rules, removing it from the
> pcre_free() you might have copied this from produces a warning for me,
> but not for free() itself. This is on GCC 8.3.0. How about for you &
> what compiler(s)?

both will trigger warnings for the same reason
(-Wincompatible-pointer-types-discards-qualifiers)
with Apple LLVM version 10.0.1 (clang-1001.0.46.4)

gcc-9 in macOS triggers 2 "warnings"; one for discarding the const
qualifier (-Wdiscarded-qualifiers)
and another for mismatching parameter to free():

note: expected 'void *' but argument is of type 'const uint8_t *' {aka
'const unsigned char *'}

Carlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] grep: make pcre1_tables version agnostic
  2019-07-27 23:47   ` Ævar Arnfjörð Bjarmason
@ 2019-07-28  2:50     ` Carlo Arenas
  0 siblings, 0 replies; 15+ messages in thread
From: Carlo Arenas @ 2019-07-28  2:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Sat, Jul 27, 2019 at 4:47 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Jul 27 2019, Carlo Marcelo Arenas Belón wrote:
>
> > 6d4b5747f0 ("grep: change internal *pcre* variable & function names
> > to be *pcre1*", 2017-05-25), renamed most variables to be PCRE1
> > specific to give space to similarly named variables for PCRE2, but
> > in this case the change wasn't needed as the types were compatible
> > enough (const unsigned char* vs const uint8_t*) to be shared.

reading this again, had to admit there is a fair amount of guessing on
intent, but reading commits and the email discussion couldn't come
up with a better explanation.

is the root cause for the bug different?, could it be that the pcre2 API
was misunderstood? and you expected this pointer will be free together
with the context? (as it is done when a cloned context with tables is
used?)

> Both the v1 and v2 functions return const unsigned char *. I don't know
> where I got the uint8_t from. This makes more sense.

the type in PCRE2 is uint8_t, the documentation has a bug[1]
almost every platform git cares for would have unsigned char = uint8_t
though.

> The point of 6d4b5747f0 was not to only split out those variables we
> couldn't get away with re-using. Then I would have later re-used
> e.g. pcre1_jit_on & pcre2_jit_on as just pcre_jit_on. We could also do
> that now.

IMHO pcre_jit_on makes more sense as a bit, with local variables being
used for the pcre*_config() call with the right type.(uint32_t != int)

> I think doing that & this part of the your changes makes things less
> readable. The two code branches we compile with ifdefs are mutually
> exclusive, so having the variables be unique helps with eyeballing /
> reasoning when changing the code.

I thought that too until the typo[2] in the pcre?_jit_on variable (now
in next) kind of
proved us wrong.  Maybe the names are too similar?

either way, would you rather drop this patch and make a replica variable?
should I rebase to next with Reviewed-By tags so that all other changes
in flight that would conflict could be corrected?

Carlo

[1] https://bugs.exim.org/show_bug.cgi?id=2420
[2] https://public-inbox.org/git/20190722181923.21572-1-dev+git@drbeat.li/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2
  2019-07-28  1:41     ` Carlo Arenas
@ 2019-07-29 20:34       ` René Scharfe
  2019-07-30  0:08         ` Carlo Arenas
  0 siblings, 1 reply; 15+ messages in thread
From: René Scharfe @ 2019-07-29 20:34 UTC (permalink / raw)
  To: Carlo Arenas, Ævar Arnfjörð Bjarmason; +Cc: git

Am 28.07.19 um 03:41 schrieb Carlo Arenas:
> On Sat, Jul 27, 2019 at 4:48 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>> +     free((void *)p->pcre_tables);
>>
>> Is the cast really needed? I'm rusty on the rules, removing it from the
>> pcre_free() you might have copied this from produces a warning for me,
>> but not for free() itself. This is on GCC 8.3.0. How about for you &
>> what compiler(s)?
>
> both will trigger warnings for the same reason
> (-Wincompatible-pointer-types-discards-qualifiers)
> with Apple LLVM version 10.0.1 (clang-1001.0.46.4)
>
> gcc-9 in macOS triggers 2 "warnings"; one for discarding the const
> qualifier (-Wdiscarded-qualifiers)
> and another for mismatching parameter to free():
>
> note: expected 'void *' but argument is of type 'const uint8_t *' {aka
> 'const unsigned char *'}

Right: pcre_maketables() returns a const pointer, and you have to cast
away this const'ness at some point if you want to free(3) that memory.
Returning a non-const pointer would be more fitting, but I guess the
idea is that users of the library are not supposed to change the
contents of the table.

But wouldn't it be more correct to use pcre_free()?  As long as we keep
pcre_malloc() and pcre_free() at their default values it doesn't matter
in practice, but using free(3) directly is a layering violation, no?

Perhaps just UNLEAK that thing?  There is only a single way to build it
and we can reuse it throughout the lifetime of the program, so there is
no real need to clean it up before the OS does.

René

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2
  2019-07-29 20:34       ` René Scharfe
@ 2019-07-30  0:08         ` Carlo Arenas
  2019-07-30 16:52           ` René Scharfe
  0 siblings, 1 reply; 15+ messages in thread
From: Carlo Arenas @ 2019-07-30  0:08 UTC (permalink / raw)
  To: René Scharfe; +Cc: Ævar Arnfjörð Bjarmason, git

On Mon, Jul 29, 2019 at 1:35 PM René Scharfe <l.s.r@web.de> wrote:
>
> Am 28.07.19 um 03:41 schrieb Carlo Arenas:
> > On Sat, Jul 27, 2019 at 4:48 PM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>> +     free((void *)p->pcre_tables);
> >>
> >> Is the cast really needed? I'm rusty on the rules, removing it from the
> >> pcre_free() you might have copied this from produces a warning for me,
> >> but not for free() itself. This is on GCC 8.3.0. How about for you &
> >> what compiler(s)?
> >
> > both will trigger warnings for the same reason
> > (-Wincompatible-pointer-types-discards-qualifiers)
> > with Apple LLVM version 10.0.1 (clang-1001.0.46.4)
> >
> > gcc-9 in macOS triggers 2 "warnings"; one for discarding the const
> > qualifier (-Wdiscarded-qualifiers)
> > and another for mismatching parameter to free():
> >
> > note: expected 'void *' but argument is of type 'const uint8_t *' {aka
> > 'const unsigned char *'}
>
> Right: pcre_maketables() returns a const pointer, and you have to cast
> away this const'ness at some point if you want to free(3) that memory.
> Returning a non-const pointer would be more fitting, but I guess the
> idea is that users of the library are not supposed to change the
> contents of the table.

note that this pointer was generated by pcre2_maketables() instead
which is actually "const uint8_t *", but yes these tables are meant to
be inmutable and that is why they are "cost".

there is even a compile time generated table for the C locale that is
used internally and obviously shouldn't be free().

> But wouldn't it be more correct to use pcre_free()?  As long as we keep
> pcre_malloc() and pcre_free() at their default values it doesn't matter
> in practice, but using free(3) directly is a layering violation, no?

yes, but that is the only option PCRE2 gives when not using a global
context which is what the comment in the commit refers to.

FWIW pcre_free() doesn't exist anymore in PCRE2.

> Perhaps just UNLEAK that thing?  There is only a single way to build it
> and we can reuse it throughout the lifetime of the program, so there is
> no real need to clean it up before the OS does.

That would be a better fit if it would be created once in cmd_grep and
then shared with all worker threads (which I thought would be nice to
do in the future anyway), but this change was trying to be conservative
and just to the minimum to close the leak.

Carlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2
  2019-07-30  0:08         ` Carlo Arenas
@ 2019-07-30 16:52           ` René Scharfe
  0 siblings, 0 replies; 15+ messages in thread
From: René Scharfe @ 2019-07-30 16:52 UTC (permalink / raw)
  To: Carlo Arenas; +Cc: Ævar Arnfjörð Bjarmason, git

Am 30.07.19 um 02:08 schrieb Carlo Arenas:
> On Mon, Jul 29, 2019 at 1:35 PM René Scharfe <l.s.r@web.de> wrote:
>>
>> Am 28.07.19 um 03:41 schrieb Carlo Arenas:
>>> On Sat, Jul 27, 2019 at 4:48 PM Ævar Arnfjörð Bjarmason
>>> <avarab@gmail.com> wrote:
>>>>> +     free((void *)p->pcre_tables);
>>>>
>>>> Is the cast really needed? I'm rusty on the rules, removing it from the
>>>> pcre_free() you might have copied this from produces a warning for me,
>>>> but not for free() itself. This is on GCC 8.3.0. How about for you &
>>>> what compiler(s)?
>>>
>>> both will trigger warnings for the same reason
>>> (-Wincompatible-pointer-types-discards-qualifiers)
>>> with Apple LLVM version 10.0.1 (clang-1001.0.46.4)
>>>
>>> gcc-9 in macOS triggers 2 "warnings"; one for discarding the const
>>> qualifier (-Wdiscarded-qualifiers)
>>> and another for mismatching parameter to free():
>>>
>>> note: expected 'void *' but argument is of type 'const uint8_t *' {aka
>>> 'const unsigned char *'}
>>
>> Right: pcre_maketables() returns a const pointer, and you have to cast
>> away this const'ness at some point if you want to free(3) that memory.
>> Returning a non-const pointer would be more fitting, but I guess the
>> idea is that users of the library are not supposed to change the
>> contents of the table.
>
> note that this pointer was generated by pcre2_maketables() instead
> which is actually "const uint8_t *", but yes these tables are meant to
> be inmutable and that is why they are "cost".

Only the const'ness matters in regards to the need for casting a pointer
to feed to free(3).

Doing the cast in a library function or using an opaque pointer type
instead of a fake const pointer would be nicer ways on the part of PCRE2
to keep callers from messing with the table.  Forcing users to cast
const away or leak is not very nice..  Nothing we can do about it,
except perhaps adding a wish list item for PCRE3, I guess.

https://pcre.org/current/doc/html/pcre2_maketables.html says that
pcre2_maketables() returns const unsigned char *, by the way.  I don't
get the PCRE2_SUFFIX business in pcre2.h ("Define macros that generate
width-specific names from generic versions."), though.

>> But wouldn't it be more correct to use pcre_free()?  As long as we keep
>> pcre_malloc() and pcre_free() at their default values it doesn't matter
>> in practice, but using free(3) directly is a layering violation, no?
>
> yes, but that is the only option PCRE2 gives when not using a global
> context which is what the comment in the commit refers to.
>
> FWIW pcre_free() doesn't exist anymore in PCRE2.

OK, and while https://pcre.org/original/doc/html/pcreapi.html#TOC1
says that pcre_malloc() is used by pcre_maketables() (and thus the
result should be passed to pcre_free() after use),
https://pcre.org/current/doc/html/pcre2_maketables.html says
pcre2_maketables() uses malloc(3), so that pointer needs to go to
free(3) at the end.  Missed the second part in my earlier reply.

>> Perhaps just UNLEAK that thing?  There is only a single way to build it
>> and we can reuse it throughout the lifetime of the program, so there is
>> no real need to clean it up before the OS does.
>
> That would be a better fit if it would be created once in cmd_grep and
> then shared with all worker threads (which I thought would be nice to
> do in the future anyway), but this change was trying to be conservative
> and just to the minimum to close the leak.

Sure.  I wonder how sharing between threads would influence performance..

René

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] grep: avoid leak of chartables in PCRE2
  2019-07-27 20:27 [PATCH 0/3] grep: memory leak in PCRE2 Carlo Marcelo Arenas Belón
                   ` (2 preceding siblings ...)
  2019-07-27 20:27 ` [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2 Carlo Marcelo Arenas Belón
@ 2019-08-01 17:09 ` Carlo Marcelo Arenas Belón
  2019-08-02 16:19   ` Junio C Hamano
  3 siblings, 1 reply; 15+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2019-08-01 17:09 UTC (permalink / raw)
  To: git; +Cc: avarab, l.s.r, gitster

94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) introduced
a small memory leak visible with valgrind in t7813.

Complete the creation of a PCRE2 specific variable that was missing from
the original change and free the generated table just like it is done
for PCRE1.

The table cleanup use free as there is no global context defined when
it was created (pcre2_maketables is passed a NULL pointer) but if that
gets ever changed will need to be updated in tandem.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
V2:
* better document why free is used as suggested by René
* avoid reusing PCRE1 variable for easy of maintenance (per Ævar)

 grep.c | 7 ++++---
 grep.h | 1 +
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/grep.c b/grep.c
index f7c3a5803e..fbd3f3757c 100644
--- a/grep.c
+++ b/grep.c
@@ -488,7 +488,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	PCRE2_UCHAR errbuf[256];
 	PCRE2_SIZE erroffset;
 	int options = PCRE2_MULTILINE;
-	const uint8_t *character_tables = NULL;
 	int jitret;
 	int patinforet;
 	size_t jitsizearg;
@@ -499,9 +498,10 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 
 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern)) {
-			character_tables = pcre2_maketables(NULL);
+			p->pcre2_tables = pcre2_maketables(NULL);
 			p->pcre2_compile_context = pcre2_compile_context_create(NULL);
-			pcre2_set_character_tables(p->pcre2_compile_context, character_tables);
+			pcre2_set_character_tables(p->pcre2_compile_context,
+							p->pcre2_tables);
 		}
 		options |= PCRE2_CASELESS;
 	}
@@ -605,6 +605,7 @@ static void free_pcre2_pattern(struct grep_pat *p)
 	pcre2_match_data_free(p->pcre2_match_data);
 	pcre2_jit_stack_free(p->pcre2_jit_stack);
 	pcre2_match_context_free(p->pcre2_match_context);
+	free((void *)p->pcre2_tables);
 }
 #else /* !USE_LIBPCRE2 */
 static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
diff --git a/grep.h b/grep.h
index 1875880f37..26d21a3433 100644
--- a/grep.h
+++ b/grep.h
@@ -96,6 +96,7 @@ struct grep_pat {
 	pcre2_compile_context *pcre2_compile_context;
 	pcre2_match_context *pcre2_match_context;
 	pcre2_jit_stack *pcre2_jit_stack;
+	const uint8_t *pcre2_tables;
 	uint32_t pcre2_jit_on;
 	kwset_t kws;
 	unsigned fixed:1;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] grep: avoid leak of chartables in PCRE2
  2019-08-01 17:09 ` [PATCH v2] grep: avoid leak of " Carlo Marcelo Arenas Belón
@ 2019-08-02 16:19   ` Junio C Hamano
  2019-08-03 18:50     ` Carlo Arenas
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2019-08-02 16:19 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón; +Cc: git, avarab, l.s.r

Carlo Marcelo Arenas Belón  <carenas@gmail.com> writes:

> 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) introduced
> a small memory leak visible with valgrind in t7813.
>
> Complete the creation of a PCRE2 specific variable that was missing from
> the original change and free the generated table just like it is done
> for PCRE1.
>
> The table cleanup use free as there is no global context defined when
> it was created (pcre2_maketables is passed a NULL pointer) but if that
> gets ever changed will need to be updated in tandem.
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
> V2:
> * better document why free is used as suggested by René
> * avoid reusing PCRE1 variable for easy of maintenance (per Ævar)

Thanks; will queue.

>
>  grep.c | 7 ++++---
>  grep.h | 1 +
>  2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index f7c3a5803e..fbd3f3757c 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -488,7 +488,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  	PCRE2_UCHAR errbuf[256];
>  	PCRE2_SIZE erroffset;
>  	int options = PCRE2_MULTILINE;
> -	const uint8_t *character_tables = NULL;
>  	int jitret;
>  	int patinforet;
>  	size_t jitsizearg;
> @@ -499,9 +498,10 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  
>  	if (opt->ignore_case) {
>  		if (has_non_ascii(p->pattern)) {
> -			character_tables = pcre2_maketables(NULL);
> +			p->pcre2_tables = pcre2_maketables(NULL);
>  			p->pcre2_compile_context = pcre2_compile_context_create(NULL);
> -			pcre2_set_character_tables(p->pcre2_compile_context, character_tables);
> +			pcre2_set_character_tables(p->pcre2_compile_context,
> +							p->pcre2_tables);
>  		}
>  		options |= PCRE2_CASELESS;
>  	}
> @@ -605,6 +605,7 @@ static void free_pcre2_pattern(struct grep_pat *p)
>  	pcre2_match_data_free(p->pcre2_match_data);
>  	pcre2_jit_stack_free(p->pcre2_jit_stack);
>  	pcre2_match_context_free(p->pcre2_match_context);
> +	free((void *)p->pcre2_tables);
>  }
>  #else /* !USE_LIBPCRE2 */
>  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
> diff --git a/grep.h b/grep.h
> index 1875880f37..26d21a3433 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -96,6 +96,7 @@ struct grep_pat {
>  	pcre2_compile_context *pcre2_compile_context;
>  	pcre2_match_context *pcre2_match_context;
>  	pcre2_jit_stack *pcre2_jit_stack;
> +	const uint8_t *pcre2_tables;
>  	uint32_t pcre2_jit_on;
>  	kwset_t kws;
>  	unsigned fixed:1;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] grep: avoid leak of chartables in PCRE2
  2019-08-02 16:19   ` Junio C Hamano
@ 2019-08-03 18:50     ` Carlo Arenas
  2019-08-05 19:34       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Carlo Arenas @ 2019-08-03 18:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, avarab, l.s.r

Junio

Thanks for fixing the conflicts in pu as well; apologize if I could
had made it easier by doings things differently

Carlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] grep: avoid leak of chartables in PCRE2
  2019-08-03 18:50     ` Carlo Arenas
@ 2019-08-05 19:34       ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2019-08-05 19:34 UTC (permalink / raw)
  To: Carlo Arenas; +Cc: git, avarab, l.s.r

Carlo Arenas <carenas@gmail.com> writes:

> Thanks for fixing the conflicts in pu as well; apologize if I could
> had made it easier by doings things differently

With many topics in flight, conflicts between topics happen all the
time.  Thanks for checking to catch a mismerge.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-08-05 19:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-27 20:27 [PATCH 0/3] grep: memory leak in PCRE2 Carlo Marcelo Arenas Belón
2019-07-27 20:27 ` [PATCH 1/3] grep: make pcre1_tables version agnostic Carlo Marcelo Arenas Belón
2019-07-27 23:47   ` Ævar Arnfjörð Bjarmason
2019-07-28  2:50     ` Carlo Arenas
2019-07-27 20:27 ` [PATCH 2/3] grep: use pcre_tables also for PCRE2 Carlo Marcelo Arenas Belón
2019-07-27 20:27 ` [PATCH 3/3] grep: plug leak of pcre chartables in PCRE2 Carlo Marcelo Arenas Belón
2019-07-27 23:48   ` Ævar Arnfjörð Bjarmason
2019-07-28  1:41     ` Carlo Arenas
2019-07-29 20:34       ` René Scharfe
2019-07-30  0:08         ` Carlo Arenas
2019-07-30 16:52           ` René Scharfe
2019-08-01 17:09 ` [PATCH v2] grep: avoid leak of " Carlo Marcelo Arenas Belón
2019-08-02 16:19   ` Junio C Hamano
2019-08-03 18:50     ` Carlo Arenas
2019-08-05 19:34       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).