From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>,
"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v2 00/10] grep/pcre2: memory allocation fixes
Date: Thu, 18 Feb 2021 01:07:18 +0100 [thread overview]
Message-ID: <20210218000728.13995-1-avarab@gmail.com> (raw)
In-Reply-To: <20210204210556.25242-1-avarab@gmail.com>
Now based on "master", when I sent v1[1] it dependen on other
in-flight topics of mine.
Addresses minor issues with the commit messages raised by Johannes on
v1, and other commit message issues I noticed myself on re-reading
this.
1. https://lore.kernel.org/git/20210204210556.25242-1-avarab@gmail.com/
Ævar Arnfjörð Bjarmason (10):
grep/pcre2: drop needless assignment + assert() on opt->pcre2
grep/pcre2: drop needless assignment to NULL
grep/pcre2: correct reference to grep_init() in comment
grep/pcre2: prepare to add debugging to pcre2_malloc()
grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode
grep/pcre2: use compile-time PCREv2 version test
grep/pcre2: use pcre2_maketables_free() function
grep/pcre2: actually make pcre2 use custom allocator
grep/pcre2: move back to thread-only PCREv2 structures
grep/pcre2: move definitions of pcre2_{malloc,free}
builtin/grep.c | 1 -
grep.c | 99 ++++++++++++++++++++++----------------------------
grep.h | 9 ++++-
3 files changed, 51 insertions(+), 58 deletions(-)
Range-diff:
1: a11f1e91552 ! 1: 40d2e47c540 grep/pcre2: drop needless assignment + assert() on opt->pcre2
@@ Commit message
There was never a good reason for this, it's just a relic from when I
initially wrote the PCREv2 support. We're not going to have confusion
about compile_pcre2_pattern() being called when it shouldn't just
- because we forgot to cargo-cult this opt->pcre2 option, and "opt"
- is (mostly) used for the options the user supplied, let's avoid the
- pattern of needlessly assigning to it.
+ because we forgot to cargo-cult this opt->pcre2 option.
- With my in-flight removal of PCREv1 [1] ("Remove support for v1 of the
- PCRE library", 2021-01-24) there'll be even less confusion around what
- we call where in these codepaths, which is one more reason to remove
- this.
+ Furthermore the "struct grep_opt" is (mostly) used for the options the
+ user supplied, let's avoid the pattern of needlessly assigning to it.
- 1. https://lore.kernel.org/git/xmqqmtwy29x8.fsf@gitster.c.googlers.com/
+ With my recent removal of the PCREv1 backend in 7599730b7e2 (Remove
+ support for v1 of the PCRE library, 2021-01-24) there's even less
+ confusion around what we call where in these codepaths, which is one
+ more reason to remove this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
2: db0ef9189e3 ! 2: e02f9b5ab50 grep/pcre2: drop needless assignment to NULL
@@ Commit message
grep/pcre2: drop needless assignment to NULL
Remove a redundant assignment of pcre2_compile_context dating back to
- my 94da9193a6e (grep: add support for PCRE v2, 2017-06-01). In
- create_grep_pat() we xcalloc() the "grep_pat" struct, so there's no
+ my 94da9193a6e (grep: add support for PCRE v2, 2017-06-01).
+
+ In create_grep_pat() we xcalloc() the "grep_pat" struct, so there's no
need to NULL out individual members here.
I think this was probably something left over from an earlier
3: 9c5429f4d96 = 3: 2bcb6c53589 grep/pcre2: correct reference to grep_init() in comment
4: e5558f5f0f1 ! 4: 78477193cd4 grep/pcre2: prepare to add debugging to pcre2_malloc()
@@ Commit message
grep/pcre2: prepare to add debugging to pcre2_malloc()
Change pcre2_malloc() in a way that'll make it easier for a debugging
- fprintf() to spew out the allocated pointer. This doesn't introduce
- any functional change, it just makes a subsequent commit's diff easier
- to read. Changes code added in 513f2b0bbd4 (grep: make PCRE2 aware of
- custom allocator, 2019-10-16).
+ fprintf() to spew out the allocated pointer.
+
+ This doesn't introduce any functional change, it just makes a
+ subsequent commit's diff easier to read. Changes code added in
+ 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
5: 7968effaa8a ! 5: cbeb521bd65 grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode
@@ Commit message
grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode
Add optional printing of PCREv2 allocations to stderr for a developer
- who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to
- "1".
+ who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to "1".
+
+ You need to manually change the definition in the source file similar
+ to the DEBUG_MAILMAP, there's no Makefile knob for this.
This will be referenced a subsequent commit, and is generally useful
to manually see what's going on with PCREv2 allocations while working
6: 604e183c224 = 6: 788929c3de6 grep/pcre2: use compile-time PCREv2 version test
7: f864a9aac4c ! 7: 6a4552c6d4e grep/pcre2: use pcre2_maketables_free() function
@@ Commit message
grep/pcre2: use pcre2_maketables_free() function
Make use of the pcre2_maketables_free() function to free the memory
- allocated by pcre2_maketables(). At first sight it's strange that
- 10da030ab75 (grep: avoid leak of chartables in PCRE2, 2019-10-16)
- which added the free() call here doesn't make use of the pcre2_free()
- the author introduced in the preceding commit in 513f2b0bbd4 (grep:
- make PCRE2 aware of custom allocator, 2019-10-16).
+ allocated by pcre2_maketables().
+
+ At first sight it's strange that 10da030ab75 (grep: avoid leak of
+ chartables in PCRE2, 2019-10-16) which added the free() call here
+ doesn't make use of the pcre2_free() the author introduced in the
+ preceding commit in 513f2b0bbd4 (grep: make PCRE2 aware of custom
+ allocator, 2019-10-16).
The reason is that at the time the function didn't exist. It was first
introduced in PCREv2 version 10.34, released on 2019-11-21.
Let's make use of it behind a macro. I don't think this matters for
anything to do with custom allocators, but it makes our use of PCREv2
- more discoverable. At some distant point in the future we'll be able
- to drop the version guard, as nobody will be running a version older
- than 10.34.
+ more discoverable.
+
+ At some distant point in the future we'll be able to drop the version
+ guard, as nobody will be running a version older than 10.34.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
8: cea9e066951 ! 8: bf58d597a4b grep/pcre2: actually make pcre2 use custom allocator
@@ Commit message
functions for allocation. We'll now use it for all PCREv2 allocations.
The reason 513f2b0bbd4 worked as a bugfix for the USE_NED_ALLOCATOR
- issue is because it managed to target pretty much the allocation freed
- via free(), as opposed to by a pcre2_*free() function. I.e. the
- pcre2_maketables() and pcre2_maketables_free() pair. For most of the
- rest we continued allocating with stock malloc() inside PCREv2 itself,
- but didn't segfault because we'd use its corresponding free().
+ issue is because it targeted the allocation freed via free(), as
+ opposed to by a pcre2_*free() function. I.e. the pcre2_maketables()
+ and pcre2_maketables_free() pair.
+
+ For most of the rest we continued allocating with stock malloc()
+ inside PCREv2 itself, but didn't segfault because we'd use its
+ corresponding free().
In a preceding commit of mine I changed the free() to
pcre2_maketables_free() on versions of PCREv2 10.34 and newer. So as
@@ Commit message
grep --threads=1 -iP æ.*var.*xyz
Only use pcre2_{malloc,free}() for 2 malloc() calls and 2
- corresponding free() call. Now it's 12 calls to each. This can be
+ corresponding free() calls. Now it's 12 calls to each. This can be
observed with the GREP_PCRE2_DEBUG_MALLOC debug mode.
Reading the history of how this bug got introduced it wasn't present
@@ Commit message
Thus the failed attempts to go down the route of only creating the
general context in cases where we ourselves call pcre2_maketables(),
before finally settling on the approach 513f2b0bbd4 took of always
- creating it.
+ creating it, but then mostly not using it.
Instead we should always create it, and then pass the general context
to those functions that accept it, so that they'll consistently use
our preferred memory allocation functions.
- 1. https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/
+ 1. https://lore.kernel.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/
2. https://lore.kernel.org/git/CAPUEsphMh_ZqcH3M7PXC9jHTfEdQN3mhTAK2JDkdvKBp53YBoA@mail.gmail.com/
3. https://lore.kernel.org/git/20190806085014.47776-3-carenas@gmail.com/
9: 99eaa918261 ! 9: 129adf872fb grep/pcre2: move back to thread-only PCREv2 structures
@@ Commit message
grep/pcre2: move back to thread-only PCREv2 structures
Change the setup of the "pcre2_general_context" to happen per-thread
- in compile_pcre2_pattern() instead of in grep_init(), as happens with
- all the rest of the pcre2_* members of the grep_pat structure.
+ in compile_pcre2_pattern() instead of in grep_init().
+
+ This change brings it in line with how the rest of the pcre2_* members
+ in the grep_pat structure are set up.
As noted in the preceding commit the approach 513f2b0bbd4 (grep: make
PCRE2 aware of custom allocator, 2019-10-16) took to allocate the
pcre2_general_context seems to have been initially based on a
misunderstanding of how PCREv2 memory allocation works.
- This approach of creating a global context is just added complexity
- for almost zero gain. On my system it's 24 bytes saved per-thread, for
- context PCREv2 will then go on to some kilobytes for its own
- thread-local state.
+ The approach of creating a global context in grep_init() is just added
+ complexity for almost zero gain. On my system it's 24 bytes saved
+ per-thread. For comparison PCREv2 will then go on to allocate at least
+ a kilobyte for its own thread-local state.
As noted in 6d423dd542f (grep: don't redundantly compile throwaway
patterns under threading, 2017-05-25) the grep code is intentionally
@@ Commit message
structures globally, while making others thread-local.
So let's remove this special case and make all of them thread-local
- for simplicity again.
+ again for simplicity. With this change we could move the
+ pcre2_{malloc,free} functions around to live closer to their current
+ use. I'm not doing that here to keep this change small, that cleanup
+ will be done in a follow-up commit.
See also the discussion in 94da9193a6 (grep: add support for PCRE v2,
2017-06-01) about thread safety, and Johannes's comments[1] to the
10: 462f12126c8 ! 10: 688d1b00d5d grep/pcre2: move definitions of pcre2_{malloc,free}
@@ Commit message
grep/pcre2: move definitions of pcre2_{malloc,free}
Move the definitions of the pcre2_{malloc,free} functions above the
- compile_pcre2_pattern() function they're used it. Before the preceding
- commit they used to be needed earlier, but now we can move them to be
- adjacent to the other PCREv2 functions.
+ compile_pcre2_pattern() function they're used in.
+
+ Before the preceding commit they used to be needed earlier, but now we
+ can move them to be adjacent to the other PCREv2 functions.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
--
2.30.0.284.gd98b1dd5eaa7
next prev parent reply other threads:[~2021-02-18 0:08 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-16 12:06 [PATCH 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget
2019-10-16 12:06 ` [PATCH 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget
2019-10-16 12:06 ` [PATCH 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget
2019-10-16 12:06 ` [PATCH 3/3] grep: avoid leak of chartables in PCRE2 Johannes Schindelin via GitGitGadget
2019-10-16 12:10 ` [PATCH v2 0/3] Revive 'pcre2-chartables-leakfix' Johannes Schindelin via GitGitGadget
2019-10-16 12:10 ` [PATCH v2 1/3] grep: make PCRE1 aware of custom allocator Carlo Marcelo Arenas Belón via GitGitGadget
2019-10-16 12:10 ` [PATCH v2 2/3] grep: make PCRE2 " Carlo Marcelo Arenas Belón via GitGitGadget
2019-10-18 1:38 ` Junio C Hamano
2021-02-04 21:05 ` [PATCH 00/10] grep/pcre2: memory allocation fixes Ævar Arnfjörð Bjarmason
2021-02-10 21:34 ` Junio C Hamano
2021-02-18 0:07 ` Ævar Arnfjörð Bjarmason [this message]
2021-03-04 0:34 ` [PATCH v2 " Junio C Hamano
2021-02-18 0:07 ` [PATCH v2 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason
2021-02-18 0:07 ` [PATCH v2 02/10] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason
2021-02-18 0:07 ` [PATCH v2 03/10] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason
2021-02-18 0:07 ` [PATCH v2 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason
2021-02-18 0:07 ` [PATCH v2 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason
2021-02-18 0:07 ` [PATCH v2 06/10] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason
2021-03-04 0:14 ` Junio C Hamano
2021-02-18 0:07 ` [PATCH v2 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason
2021-02-18 0:07 ` [PATCH v2 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason
2021-03-04 0:24 ` Junio C Hamano
2021-02-18 0:07 ` [PATCH v2 09/10] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason
2021-03-04 0:27 ` Junio C Hamano
2021-02-18 0:07 ` [PATCH v2 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason
2021-03-04 0:28 ` Junio C Hamano
2021-02-04 21:05 ` [PATCH 01/10] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason
2021-02-04 21:05 ` [PATCH 02/10] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason
2021-02-04 21:05 ` [PATCH 03/10] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason
2021-02-04 21:05 ` [PATCH 04/10] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason
2021-02-04 21:05 ` [PATCH 05/10] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason
2021-02-10 10:38 ` Johannes Schindelin
2021-02-04 21:05 ` [PATCH 06/10] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason
2021-02-04 21:05 ` [PATCH 07/10] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason
2021-02-10 10:43 ` Johannes Schindelin
2021-03-04 0:16 ` Junio C Hamano
2021-02-04 21:05 ` [PATCH 08/10] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason
2021-02-10 12:38 ` Johannes Schindelin
2021-02-04 21:05 ` [PATCH 09/10] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason
2021-02-04 21:05 ` [PATCH 10/10] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason
2021-02-10 12:40 ` Johannes Schindelin
2019-10-16 12:10 ` [PATCH v2 3/3] grep: avoid leak of chartables in PCRE2 Carlo Marcelo Arenas Belón via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210218000728.13995-1-avarab@gmail.com \
--to=avarab@gmail.com \
--cc=carenas@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).