* [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API
2022-12-15 9:11 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
@ 2022-12-15 9:11 ` Ævar Arnfjörð Bjarmason
2022-12-17 12:45 ` René Scharfe
2022-12-15 9:11 ` [RFC PATCH 2/5] various: add missing strvec_clear() Ævar Arnfjörð Bjarmason
` (5 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15 9:11 UTC (permalink / raw)
To: git
Cc: René Scharfe, Junio C Hamano,
Ævar Arnfjörð Bjarmason
When this code was migrated to "struct strvec" (or rather, its
predecessor API) in 8c2cfa55446 (annotate: use argv_array, 2014-07-16)
it didn't take full advantage of what we were given:
* We are always passed the name "annotate" as argv[0] here, so we
don't need to re-hardcode it. We've already declared it in "struct
cmd_struct commands" in git.c.
* We are guaranteed to get at least one argument, so rather than
looping here ourselves let's have strvec_pushv() handle that. If we
only have one argument we'll pass the terminating NULL to it, making
it a NOOP.
This change helps to make the subsequent commit smaller, and as a
bonus removes the type discrepancy between "int" and "size_t".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/annotate.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/builtin/annotate.c b/builtin/annotate.c
index 58ff977a231..e37b269196f 100644
--- a/builtin/annotate.c
+++ b/builtin/annotate.c
@@ -7,16 +7,12 @@
#include "builtin.h"
#include "strvec.h"
-int cmd_annotate(int argc, const char **argv, const char *prefix)
+int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
{
struct strvec args = STRVEC_INIT;
- int i;
- strvec_pushl(&args, "annotate", "-c", NULL);
-
- for (i = 1; i < argc; i++) {
- strvec_push(&args, argv[i]);
- }
+ strvec_pushl(&args, argv[0], "-c", NULL);
+ strvec_pushv(&args, &argv[1]);
return cmd_blame(args.nr, args.v, prefix);
}
--
2.39.0.rc2.1048.g0e5493b8d5b
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API
2022-12-15 9:11 ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45 ` René Scharfe
0 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano
Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> When this code was migrated to "struct strvec" (or rather, its
> predecessor API) in 8c2cfa55446 (annotate: use argv_array, 2014-07-16)
> it didn't take full advantage of what we were given:
>
> * We are always passed the name "annotate" as argv[0] here, so we
> don't need to re-hardcode it. We've already declared it in "struct
> cmd_struct commands" in git.c.
This change has nothing to do with strvec; it could have been done by
e68989a739 (annotate: fix for cvsserver., 2007-02-06) already. I don't
see much of a benefit to it because the string is already there, but if
it's needed then I think it deserves its own patch.
> * We are guaranteed to get at least one argument, so rather than
> looping here ourselves let's have strvec_pushv() handle that. If we
> only have one argument we'll pass the terminating NULL to it, making
> it a NOOP.
85b343245b (argv-array: implement argv_array_pushv(), 2015-06-14) added
the _pushv() function, so 8c2cfa55446 could not have used it.
> This change helps to make the subsequent commit smaller, and as a
> bonus removes the type discrepancy between "int" and "size_t".
This type mismatch is benign as long as we get a valid argc value,
as positive int fit into size_t. Getting rid of the loop is nice,
though.
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> builtin/annotate.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/annotate.c b/builtin/annotate.c
> index 58ff977a231..e37b269196f 100644
> --- a/builtin/annotate.c
> +++ b/builtin/annotate.c
> @@ -7,16 +7,12 @@
> #include "builtin.h"
> #include "strvec.h"
>
> -int cmd_annotate(int argc, const char **argv, const char *prefix)
> +int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
> {
> struct strvec args = STRVEC_INIT;
> - int i;
>
> - strvec_pushl(&args, "annotate", "-c", NULL);
> -
> - for (i = 1; i < argc; i++) {
> - strvec_push(&args, argv[i]);
> - }
> + strvec_pushl(&args, argv[0], "-c", NULL);
> + strvec_pushv(&args, &argv[1]);
^^^^^^^^
"argv + 1" would be easier to read.
>
> return cmd_blame(args.nr, args.v, prefix);
> }
^ permalink raw reply [flat|nested] 24+ messages in thread
* [RFC PATCH 2/5] various: add missing strvec_clear()
2022-12-15 9:11 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
2022-12-15 9:11 ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
@ 2022-12-15 9:11 ` Ævar Arnfjörð Bjarmason
2022-12-15 9:11 ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
6 siblings, 0 replies; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15 9:11 UTC (permalink / raw)
To: git
Cc: René Scharfe, Junio C Hamano,
Ævar Arnfjörð Bjarmason
Fix or partially fix memory leaks that happened because a
strvec_clear() was missing. The "partially" will be addressed in a
subsequent commit, we'll still leak in cases where the function we're
calling munges our "v" member.
In the case of "builtin/describe.c" let's change it to use the macro
initializer rather than the strvec_init() while we're at it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/annotate.c | 5 ++++-
builtin/describe.c | 8 +++++---
builtin/stash.c | 6 +++++-
builtin/upload-archive.c | 7 +++++--
4 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/builtin/annotate.c b/builtin/annotate.c
index e37b269196f..de58deadfc7 100644
--- a/builtin/annotate.c
+++ b/builtin/annotate.c
@@ -10,9 +10,12 @@
int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
{
struct strvec args = STRVEC_INIT;
+ int ret;
strvec_pushl(&args, argv[0], "-c", NULL);
strvec_pushv(&args, &argv[1]);
- return cmd_blame(args.nr, args.v, prefix);
+ ret = cmd_blame(args.nr, args.v, prefix);
+ strvec_clear(&args);
+ return ret;
}
diff --git a/builtin/describe.c b/builtin/describe.c
index eea1e330c00..cb205f6b561 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -597,9 +597,9 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
if (contains) {
struct string_list_item *item;
- struct strvec args;
+ struct strvec args = STRVEC_INIT;
+ int ret;
- strvec_init(&args);
strvec_pushl(&args, "name-rev",
"--peel-tag", "--name-only", "--no-undefined",
NULL);
@@ -616,7 +616,9 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
strvec_pushv(&args, argv);
else
strvec_push(&args, "HEAD");
- return cmd_name_rev(args.nr, args.v, prefix);
+ ret = cmd_name_rev(args.nr, args.v, prefix);
+ strvec_clear(&args);
+ return ret;
}
hashmap_init(&names, commit_name_neq, NULL, 0);
diff --git a/builtin/stash.c b/builtin/stash.c
index bb0fd861434..e504e22e0b9 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -961,6 +961,7 @@ static int show_stash(int argc, const char **argv, const char *prefix)
ret = diff_result_code(&rev.diffopt, 0);
cleanup:
strvec_clear(&stash_args);
+ strvec_clear(&revision_args);
free_stash_info(&info);
release_revisions(&rev);
if (do_usage)
@@ -1838,6 +1839,7 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
OPT_SUBCOMMAND_F("save", &fn, save_stash, PARSE_OPT_NOCOMPLETE),
OPT_END()
};
+ int ret;
git_config(git_stash_config, NULL);
@@ -1861,5 +1863,7 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
/* Assume 'stash push' */
strvec_push(&args, "push");
strvec_pushv(&args, argv);
- return !!push_stash(args.nr, args.v, prefix, 1);
+ ret = !!push_stash(args.nr, args.v, prefix, 1);
+ strvec_clear(&args);
+ return ret;
}
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 945ee2b4126..6ef0d06ee8b 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -21,6 +21,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
{
struct strvec sent_argv = STRVEC_INIT;
const char *arg_cmd = "argument ";
+ int ret;
if (argc != 2 || !strcmp(argv[1], "-h"))
usage(upload_archive_usage);
@@ -45,8 +46,10 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
}
/* parse all options sent by the client */
- return write_archive(sent_argv.nr, sent_argv.v, prefix,
- the_repository, NULL, 1);
+ ret = write_archive(sent_argv.nr, sent_argv.v, prefix,
+ the_repository, NULL, 1);
+ strvec_clear(&sent_argv);
+ return ret;
}
__attribute__((format (printf, 1, 2)))
--
2.39.0.rc2.1048.g0e5493b8d5b
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP"
2022-12-15 9:11 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
2022-12-15 9:11 ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
2022-12-15 9:11 ` [RFC PATCH 2/5] various: add missing strvec_clear() Ævar Arnfjörð Bjarmason
@ 2022-12-15 9:11 ` Ævar Arnfjörð Bjarmason
2022-12-17 12:45 ` René Scharfe
2022-12-15 9:11 ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15 9:11 UTC (permalink / raw)
To: git
Cc: René Scharfe, Junio C Hamano,
Ævar Arnfjörð Bjarmason
We have various tricky cases where we'll leak a "struct strvec" even
when we call strvec_clear(), these happen because we'll call
setup_revisions(), parse_options() etc., which will munge our "v"
member.
There's various potential ways to deal with that, see the extensive
on-list discussion at [1]. One way would be to pass a flag to ask the
underlying API to free() these, as was done for setup_revisions() in
[2].
But we don't need that complexity for many common cases, which are
pushing fixed strings to the "struct strvec". Let's instead add a flag
analogous to the "strdup_strings" flag in the "struct string_list". A
subsequent commit will make use of this API.
Implementation notes: The BUG_unless_dup() is implemented as a macro
so we'll report the correct line number on BUG(). The "nodup_strings"
flag could have been named a "strdup_strings" for consistency with the
"struct string_list" API, but to do so we'd have to be confident that
we've spotted all callers that assume that they can memset() a "struct
strvec" to zero.
1. https://lore.kernel.org/git/221214.86ilie48cv.gmgdl@evledraar.gmail.com/
2. f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
need free()-ing, 2022-08-02)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
strvec.c | 20 ++++++++++++++++++--
strvec.h | 30 +++++++++++++++++++++++++++++-
2 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/strvec.c b/strvec.c
index 61a76ce6cb9..721f8e94a50 100644
--- a/strvec.c
+++ b/strvec.c
@@ -10,6 +10,16 @@ void strvec_init(struct strvec *array)
memcpy(array, &blank, sizeof(*array));
}
+void strvec_init_nodup(struct strvec *array)
+{
+ struct strvec blank = STRVEC_INIT_NODUP;
+ memcpy(array, &blank, sizeof(*array));
+}
+
+#define BUG_unless_dup(array, fn) \
+ if ((array)->nodup_strings) \
+ BUG("cannot %s() on a 'STRVEC_INIT_NODUP' strvec", (fn))
+
static void strvec_push_nodup(struct strvec *array, const char *value)
{
if (array->v == empty_strvec)
@@ -22,7 +32,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value)
const char *strvec_push(struct strvec *array, const char *value)
{
- strvec_push_nodup(array, xstrdup(value));
+ const char *to_push = array->nodup_strings ? value : xstrdup(value);
+
+ strvec_push_nodup(array, to_push);
return array->v[array->nr - 1];
}
@@ -31,6 +43,8 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...)
va_list ap;
struct strbuf v = STRBUF_INIT;
+ BUG_unless_dup(array, "strvec_pushf");
+
va_start(ap, fmt);
strbuf_vaddf(&v, fmt, ap);
va_end(ap);
@@ -67,6 +81,8 @@ void strvec_pop(struct strvec *array)
void strvec_split(struct strvec *array, const char *to_split)
{
+ BUG_unless_dup(array, "strvec_pushf");
+
while (isspace(*to_split))
to_split++;
for (;;) {
@@ -89,7 +105,7 @@ void strvec_clear(struct strvec *array)
{
if (array->v != empty_strvec) {
int i;
- for (i = 0; i < array->nr; i++)
+ for (i = 0; !array->nodup_strings && i < array->nr; i++)
free((char *)array->v[i]);
free(array->v);
}
diff --git a/strvec.h b/strvec.h
index 9f55c8766ba..b122b87b369 100644
--- a/strvec.h
+++ b/strvec.h
@@ -26,29 +26,51 @@ extern const char *empty_strvec[];
* member contains the actual array; the `nr` member contains the
* number of elements in the array, not including the terminating
* NULL.
+ *
+ * When using `STRVEC_INIT_NODUP` to initialize it the `nodup_strings'
+ * member is set, and individual members of the "struct strvec" will
+ * not be free()'d by strvec_clear(). This is for fixed string
+ * arguments to parse_options() and others that might munge the "v"
+ * itself.
*/
struct strvec {
const char **v;
size_t nr;
size_t alloc;
+ unsigned int nodup_strings:1;
};
#define STRVEC_INIT { \
.v = empty_strvec, \
}
+#define STRVEC_INIT_NODUP { \
+ .v = empty_strvec, \
+ .nodup_strings = 1, \
+}
+
/**
* Initialize an array. This is no different than assigning from
* `STRVEC_INIT`.
*/
void strvec_init(struct strvec *);
+/**
+ * Initialize a "nodup" array. This is no different than assigning from
+ * `STRVEC_INIT_NODUP`.
+ */
+void strvec_init_nodup(struct strvec *);
+
/* Push a copy of a string onto the end of the array. */
const char *strvec_push(struct strvec *, const char *);
/**
* Format a string and push it onto the end of the array. This is a
* convenience wrapper combining `strbuf_addf` and `strvec_push`.
+ *
+ * This is incompatible with arrays initialized with
+ * `STRVEC_INIT_NODUP`, as pushing the formatted string requires the
+ * equivalent of an xstrfmt().
*/
__attribute__((format (printf,2,3)))
const char *strvec_pushf(struct strvec *, const char *fmt, ...);
@@ -70,7 +92,13 @@ void strvec_pushv(struct strvec *, const char **);
*/
void strvec_pop(struct strvec *);
-/* Splits by whitespace; does not handle quoted arguments! */
+/**
+ * Splits by whitespace; does not handle quoted arguments!
+ *
+ * This is incompatible with arrays initialized with
+ * `STRVEC_INIT_NODUP`, as pushing the elements requires an xstrndup()
+ * call.
+ */
void strvec_split(struct strvec *, const char *);
/**
--
2.39.0.rc2.1048.g0e5493b8d5b
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP"
2022-12-15 9:11 ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45 ` René Scharfe
0 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano
Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> We have various tricky cases where we'll leak a "struct strvec" even
> when we call strvec_clear(), these happen because we'll call
> setup_revisions(), parse_options() etc., which will munge our "v"
> member.
>
> There's various potential ways to deal with that, see the extensive
> on-list discussion at [1]. One way would be to pass a flag to ask the
> underlying API to free() these, as was done for setup_revisions() in
> [2].
>
> But we don't need that complexity for many common cases, which are
> pushing fixed strings to the "struct strvec". Let's instead add a flag
> analogous to the "strdup_strings" flag in the "struct string_list". A
> subsequent commit will make use of this API.
>
> Implementation notes: The BUG_unless_dup() is implemented as a macro
> so we'll report the correct line number on BUG(). The "nodup_strings"
> flag could have been named a "strdup_strings" for consistency with the
> "struct string_list" API, but to do so we'd have to be confident that
> we've spotted all callers that assume that they can memset() a "struct
> strvec" to zero.
If the two variants would get separate types then the compiler would
take care of these checks. Conceptually they are different enough and
have different operations; they only share the same struct layout.
But I doubt a _nodup variant as its own type would pull its weight.
It would basically be a trivial wrapper around REALLOC_ARRAY..
>
> 1. https://lore.kernel.org/git/221214.86ilie48cv.gmgdl@evledraar.gmail.com/
> 2. f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
> need free()-ing, 2022-08-02)
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> strvec.c | 20 ++++++++++++++++++--
> strvec.h | 30 +++++++++++++++++++++++++++++-
> 2 files changed, 47 insertions(+), 3 deletions(-)
>
> diff --git a/strvec.c b/strvec.c
> index 61a76ce6cb9..721f8e94a50 100644
> --- a/strvec.c
> +++ b/strvec.c
> @@ -10,6 +10,16 @@ void strvec_init(struct strvec *array)
> memcpy(array, &blank, sizeof(*array));
> }
>
> +void strvec_init_nodup(struct strvec *array)
> +{
> + struct strvec blank = STRVEC_INIT_NODUP;
> + memcpy(array, &blank, sizeof(*array));
> +}
> +
> +#define BUG_unless_dup(array, fn) \
> + if ((array)->nodup_strings) \
> + BUG("cannot %s() on a 'STRVEC_INIT_NODUP' strvec", (fn))
> +
> static void strvec_push_nodup(struct strvec *array, const char *value)
> {
> if (array->v == empty_strvec)
> @@ -22,7 +32,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value)
>
> const char *strvec_push(struct strvec *array, const char *value)
> {
> - strvec_push_nodup(array, xstrdup(value));
> + const char *to_push = array->nodup_strings ? value : xstrdup(value);
> +
> + strvec_push_nodup(array, to_push);
> return array->v[array->nr - 1];
> }
>
> @@ -31,6 +43,8 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...)
> va_list ap;
> struct strbuf v = STRBUF_INIT;
>
> + BUG_unless_dup(array, "strvec_pushf");
> +
> va_start(ap, fmt);
> strbuf_vaddf(&v, fmt, ap);
> va_end(ap);
> @@ -67,6 +81,8 @@ void strvec_pop(struct strvec *array)
>
> void strvec_split(struct strvec *array, const char *to_split)
> {
> + BUG_unless_dup(array, "strvec_pushf");
> +
> while (isspace(*to_split))
> to_split++;
> for (;;) {
> @@ -89,7 +105,7 @@ void strvec_clear(struct strvec *array)
> {
> if (array->v != empty_strvec) {
> int i;
> - for (i = 0; i < array->nr; i++)
> + for (i = 0; !array->nodup_strings && i < array->nr; i++)
> free((char *)array->v[i]);
> free(array->v);
> }
> diff --git a/strvec.h b/strvec.h
> index 9f55c8766ba..b122b87b369 100644
> --- a/strvec.h
> +++ b/strvec.h
> @@ -26,29 +26,51 @@ extern const char *empty_strvec[];
> * member contains the actual array; the `nr` member contains the
> * number of elements in the array, not including the terminating
> * NULL.
> + *
> + * When using `STRVEC_INIT_NODUP` to initialize it the `nodup_strings'
> + * member is set, and individual members of the "struct strvec" will
> + * not be free()'d by strvec_clear(). This is for fixed string
> + * arguments to parse_options() and others that might munge the "v"
> + * itself.
> */
> struct strvec {
> const char **v;
> size_t nr;
> size_t alloc;
> + unsigned int nodup_strings:1;
> };
>
> #define STRVEC_INIT { \
> .v = empty_strvec, \
> }
>
> +#define STRVEC_INIT_NODUP { \
> + .v = empty_strvec, \
> + .nodup_strings = 1, \
> +}
> +
> /**
> * Initialize an array. This is no different than assigning from
> * `STRVEC_INIT`.
> */
> void strvec_init(struct strvec *);
>
> +/**
> + * Initialize a "nodup" array. This is no different than assigning from
> + * `STRVEC_INIT_NODUP`.
> + */
> +void strvec_init_nodup(struct strvec *);
> +
> /* Push a copy of a string onto the end of the array. */
> const char *strvec_push(struct strvec *, const char *);
>
> /**
> * Format a string and push it onto the end of the array. This is a
> * convenience wrapper combining `strbuf_addf` and `strvec_push`.
> + *
> + * This is incompatible with arrays initialized with
> + * `STRVEC_INIT_NODUP`, as pushing the formatted string requires the
> + * equivalent of an xstrfmt().
> */
> __attribute__((format (printf,2,3)))
> const char *strvec_pushf(struct strvec *, const char *fmt, ...);
> @@ -70,7 +92,13 @@ void strvec_pushv(struct strvec *, const char **);
> */
> void strvec_pop(struct strvec *);
>
> -/* Splits by whitespace; does not handle quoted arguments! */
> +/**
> + * Splits by whitespace; does not handle quoted arguments!
> + *
> + * This is incompatible with arrays initialized with
> + * `STRVEC_INIT_NODUP`, as pushing the elements requires an xstrndup()
> + * call.
> + */
> void strvec_split(struct strvec *, const char *);
>
> /**
^ permalink raw reply [flat|nested] 24+ messages in thread
* [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP"
2022-12-15 9:11 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
` (2 preceding siblings ...)
2022-12-15 9:11 ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
@ 2022-12-15 9:11 ` Ævar Arnfjörð Bjarmason
2022-12-17 12:45 ` René Scharfe
2022-12-15 9:11 ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15 9:11 UTC (permalink / raw)
To: git
Cc: René Scharfe, Junio C Hamano,
Ævar Arnfjörð Bjarmason
For these cases where all of the strings we're pushing to the "struct
strvec" are fixed strings we can fix widespread memory leaks by
skipping the xstrdup() on strvec_push().
More in-tree users could benefit from this to save needless
xstrdup()-ing, but for all of these we were munging the "v" member, so
the subsequent strvec_clear() wouldn't free the memory.
Now there's no need to free the individual elements, but we'll still
need to free the container with the strvec_clear().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/am.c | 2 +-
builtin/annotate.c | 2 +-
builtin/stash.c | 2 +-
t/t0023-crlf-am.sh | 1 +
t/t4152-am-subjects.sh | 2 ++
t/t4254-am-corrupt.sh | 2 ++
t/t4256-am-format-flowed.sh | 1 +
t/t4257-am-interactive.sh | 2 ++
t/t5403-post-checkout-hook.sh | 1 +
9 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/builtin/am.c b/builtin/am.c
index 30c9b3a9cd7..691ec1d152d 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail)
static int run_apply(const struct am_state *state, const char *index_file)
{
struct strvec apply_paths = STRVEC_INIT;
- struct strvec apply_opts = STRVEC_INIT;
+ struct strvec apply_opts = STRVEC_INIT_NODUP;
struct apply_state apply_state;
int res, opts_left;
int force_apply = 0;
diff --git a/builtin/annotate.c b/builtin/annotate.c
index de58deadfc7..99d97c1a8c0 100644
--- a/builtin/annotate.c
+++ b/builtin/annotate.c
@@ -9,7 +9,7 @@
int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
{
- struct strvec args = STRVEC_INIT;
+ struct strvec args = STRVEC_INIT_NODUP;
int ret;
strvec_pushl(&args, argv[0], "-c", NULL);
diff --git a/builtin/stash.c b/builtin/stash.c
index e504e22e0b9..b15dd2ebb3c 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1823,7 +1823,7 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
{
pid_t pid = getpid();
const char *index_file;
- struct strvec args = STRVEC_INIT;
+ struct strvec args = STRVEC_INIT_NODUP;
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
OPT_SUBCOMMAND("apply", &fn, apply_stash),
diff --git a/t/t0023-crlf-am.sh b/t/t0023-crlf-am.sh
index f9bbb91f64e..575805513a3 100755
--- a/t/t0023-crlf-am.sh
+++ b/t/t0023-crlf-am.sh
@@ -2,6 +2,7 @@
test_description='Test am with auto.crlf'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
cat >patchfile <<\EOF
diff --git a/t/t4152-am-subjects.sh b/t/t4152-am-subjects.sh
index 4c68245acad..9f2edba1f83 100755
--- a/t/t4152-am-subjects.sh
+++ b/t/t4152-am-subjects.sh
@@ -1,6 +1,8 @@
#!/bin/sh
test_description='test subject preservation with format-patch | am'
+
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
make_patches() {
diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index 54be7da1611..45f1d4f95e5 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -1,6 +1,8 @@
#!/bin/sh
test_description='git am with corrupt input'
+
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
make_mbox_with_nul () {
diff --git a/t/t4256-am-format-flowed.sh b/t/t4256-am-format-flowed.sh
index 2369c4e17ad..1015273bc82 100755
--- a/t/t4256-am-format-flowed.sh
+++ b/t/t4256-am-format-flowed.sh
@@ -2,6 +2,7 @@
test_description='test format=flowed support of git am'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success 'setup' '
diff --git a/t/t4257-am-interactive.sh b/t/t4257-am-interactive.sh
index aed8f4de3d6..f26d7fd2dbd 100755
--- a/t/t4257-am-interactive.sh
+++ b/t/t4257-am-interactive.sh
@@ -1,6 +1,8 @@
#!/bin/sh
test_description='am --interactive tests'
+
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success 'set up patches to apply' '
diff --git a/t/t5403-post-checkout-hook.sh b/t/t5403-post-checkout-hook.sh
index 978f240cdac..cfaae547398 100755
--- a/t/t5403-post-checkout-hook.sh
+++ b/t/t5403-post-checkout-hook.sh
@@ -7,6 +7,7 @@ test_description='Test the post-checkout hook.'
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success setup '
--
2.39.0.rc2.1048.g0e5493b8d5b
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP"
2022-12-15 9:11 ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45 ` René Scharfe
0 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano
Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> For these cases where all of the strings we're pushing to the "struct
> strvec" are fixed strings we can fix widespread memory leaks by
> skipping the xstrdup() on strvec_push().
>
> More in-tree users could benefit from this to save needless
> xstrdup()-ing, but for all of these we were munging the "v" member, so
> the subsequent strvec_clear() wouldn't free the memory.
>
> Now there's no need to free the individual elements, but we'll still
> need to free the container with the strvec_clear().
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> builtin/am.c | 2 +-
> builtin/annotate.c | 2 +-
> builtin/stash.c | 2 +-
> t/t0023-crlf-am.sh | 1 +
> t/t4152-am-subjects.sh | 2 ++
> t/t4254-am-corrupt.sh | 2 ++
> t/t4256-am-format-flowed.sh | 1 +
> t/t4257-am-interactive.sh | 2 ++
> t/t5403-post-checkout-hook.sh | 1 +
> 9 files changed, 12 insertions(+), 3 deletions(-)
Looks nice and easy, though the test updates are a bit distracting.
They are all enabled by the change to builtin/am.c, right?
>
> diff --git a/builtin/am.c b/builtin/am.c
> index 30c9b3a9cd7..691ec1d152d 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail)
> static int run_apply(const struct am_state *state, const char *index_file)
> {
> struct strvec apply_paths = STRVEC_INIT;
> - struct strvec apply_opts = STRVEC_INIT;
> + struct strvec apply_opts = STRVEC_INIT_NODUP;
> struct apply_state apply_state;
> int res, opts_left;
> int force_apply = 0;
> diff --git a/builtin/annotate.c b/builtin/annotate.c
> index de58deadfc7..99d97c1a8c0 100644
> --- a/builtin/annotate.c
> +++ b/builtin/annotate.c
> @@ -9,7 +9,7 @@
>
> int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
> {
> - struct strvec args = STRVEC_INIT;
> + struct strvec args = STRVEC_INIT_NODUP;
> int ret;
>
> strvec_pushl(&args, argv[0], "-c", NULL);
> diff --git a/builtin/stash.c b/builtin/stash.c
> index e504e22e0b9..b15dd2ebb3c 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -1823,7 +1823,7 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
> {
> pid_t pid = getpid();
> const char *index_file;
> - struct strvec args = STRVEC_INIT;
> + struct strvec args = STRVEC_INIT_NODUP;
> parse_opt_subcommand_fn *fn = NULL;
> struct option options[] = {
> OPT_SUBCOMMAND("apply", &fn, apply_stash),
> diff --git a/t/t0023-crlf-am.sh b/t/t0023-crlf-am.sh
> index f9bbb91f64e..575805513a3 100755
> --- a/t/t0023-crlf-am.sh
> +++ b/t/t0023-crlf-am.sh
> @@ -2,6 +2,7 @@
>
> test_description='Test am with auto.crlf'
>
> +TEST_PASSES_SANITIZE_LEAK=true
> . ./test-lib.sh
>
> cat >patchfile <<\EOF
> diff --git a/t/t4152-am-subjects.sh b/t/t4152-am-subjects.sh
> index 4c68245acad..9f2edba1f83 100755
> --- a/t/t4152-am-subjects.sh
> +++ b/t/t4152-am-subjects.sh
> @@ -1,6 +1,8 @@
> #!/bin/sh
>
> test_description='test subject preservation with format-patch | am'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> . ./test-lib.sh
>
> make_patches() {
> diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
> index 54be7da1611..45f1d4f95e5 100755
> --- a/t/t4254-am-corrupt.sh
> +++ b/t/t4254-am-corrupt.sh
> @@ -1,6 +1,8 @@
> #!/bin/sh
>
> test_description='git am with corrupt input'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> . ./test-lib.sh
>
> make_mbox_with_nul () {
> diff --git a/t/t4256-am-format-flowed.sh b/t/t4256-am-format-flowed.sh
> index 2369c4e17ad..1015273bc82 100755
> --- a/t/t4256-am-format-flowed.sh
> +++ b/t/t4256-am-format-flowed.sh
> @@ -2,6 +2,7 @@
>
> test_description='test format=flowed support of git am'
>
> +TEST_PASSES_SANITIZE_LEAK=true
> . ./test-lib.sh
>
> test_expect_success 'setup' '
> diff --git a/t/t4257-am-interactive.sh b/t/t4257-am-interactive.sh
> index aed8f4de3d6..f26d7fd2dbd 100755
> --- a/t/t4257-am-interactive.sh
> +++ b/t/t4257-am-interactive.sh
> @@ -1,6 +1,8 @@
> #!/bin/sh
>
> test_description='am --interactive tests'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> . ./test-lib.sh
>
> test_expect_success 'set up patches to apply' '
> diff --git a/t/t5403-post-checkout-hook.sh b/t/t5403-post-checkout-hook.sh
> index 978f240cdac..cfaae547398 100755
> --- a/t/t5403-post-checkout-hook.sh
> +++ b/t/t5403-post-checkout-hook.sh
> @@ -7,6 +7,7 @@ test_description='Test the post-checkout hook.'
> GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>
> +TEST_PASSES_SANITIZE_LEAK=true
> . ./test-lib.sh
>
> test_expect_success setup '
^ permalink raw reply [flat|nested] 24+ messages in thread
* [RFC PATCH 5/5] strvec API users: fix more leaks by using "STRVEC_INIT_NODUP"
2022-12-15 9:11 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
` (3 preceding siblings ...)
2022-12-15 9:11 ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
@ 2022-12-15 9:11 ` Ævar Arnfjörð Bjarmason
2022-12-17 12:45 ` René Scharfe
2022-12-17 12:45 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks René Scharfe
2022-12-17 13:13 ` Jeff King
6 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15 9:11 UTC (permalink / raw)
To: git
Cc: René Scharfe, Junio C Hamano,
Ævar Arnfjörð Bjarmason
For these cases where most of the the strings we were pushing to the
"struct strvec" were fixed strings, but others are dynamically
allocated. For the latter we keep around a "to_free" list of them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/describe.c | 22 +++++++++++++++++-----
builtin/upload-archive.c | 9 +++++++--
t/t5003-archive-zip.sh | 1 +
3 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/builtin/describe.c b/builtin/describe.c
index cb205f6b561..eda59ebb19a 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -596,8 +596,10 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
die(_("options '%s' and '%s' cannot be used together"), "--long", "--abbrev=0");
if (contains) {
+ struct string_list to_free = STRING_LIST_INIT_DUP;
struct string_list_item *item;
- struct strvec args = STRVEC_INIT;
+ struct strvec args = STRVEC_INIT_NODUP;
+
int ret;
strvec_pushl(&args, "name-rev",
@@ -607,10 +609,19 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
strvec_push(&args, "--always");
if (!all) {
strvec_push(&args, "--tags");
- for_each_string_list_item(item, &patterns)
- strvec_pushf(&args, "--refs=refs/tags/%s", item->string);
- for_each_string_list_item(item, &exclude_patterns)
- strvec_pushf(&args, "--exclude=refs/tags/%s", item->string);
+ for_each_string_list_item(item, &patterns) {
+ char *str = xstrfmt("--refs=refs/tags/%s", item->string);
+ const char *item = string_list_append_nodup(&to_free, str)->string;
+
+ strvec_push(&args, item);
+ }
+ for_each_string_list_item(item, &exclude_patterns) {
+ char *str = xstrfmt("--exclude=refs/tags/%s", item->string);
+
+ const char *item = string_list_append_nodup(&to_free, str)->string;
+
+ strvec_push(&args, item);
+ }
}
if (argc)
strvec_pushv(&args, argv);
@@ -618,6 +629,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
strvec_push(&args, "HEAD");
ret = cmd_name_rev(args.nr, args.v, prefix);
strvec_clear(&args);
+ string_list_clear(&to_free, 0);
return ret;
}
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 6ef0d06ee8b..95c6cdc4be0 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -19,7 +19,8 @@ static const char deadchild[] =
int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
{
- struct strvec sent_argv = STRVEC_INIT;
+ struct string_list to_free = STRING_LIST_INIT_DUP;
+ struct strvec sent_argv = STRVEC_INIT_NODUP;
const char *arg_cmd = "argument ";
int ret;
@@ -34,6 +35,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
/* put received options in sent_argv[] */
strvec_push(&sent_argv, "git-upload-archive");
for (;;) {
+ struct string_list_item *item;
char *buf = packet_read_line(0, NULL);
if (!buf)
break; /* got a flush */
@@ -42,13 +44,16 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
if (!starts_with(buf, arg_cmd))
die("'argument' token or flush expected");
- strvec_push(&sent_argv, buf + strlen(arg_cmd));
+
+ item = string_list_append(&to_free, buf + strlen(arg_cmd));
+ strvec_push(&sent_argv, item->string);
}
/* parse all options sent by the client */
ret = write_archive(sent_argv.nr, sent_argv.v, prefix,
the_repository, NULL, 1);
strvec_clear(&sent_argv);
+ string_list_clear(&to_free, 0);
return ret;
}
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index fc499cdff01..cc1ce060558 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -2,6 +2,7 @@
test_description='git archive --format=zip test'
+TEST_PASSES_SANITIZE_LEAK=true
TEST_CREATE_REPO_NO_TEMPLATE=1
. ./test-lib.sh
--
2.39.0.rc2.1048.g0e5493b8d5b
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 5/5] strvec API users: fix more leaks by using "STRVEC_INIT_NODUP"
2022-12-15 9:11 ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45 ` René Scharfe
0 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano
Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> For these cases where most of the the strings we were pushing to the
> "struct strvec" were fixed strings, but others are dynamically
> allocated. For the latter we keep around a "to_free" list of them.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> builtin/describe.c | 22 +++++++++++++++++-----
> builtin/upload-archive.c | 9 +++++++--
> t/t5003-archive-zip.sh | 1 +
> 3 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/describe.c b/builtin/describe.c
> index cb205f6b561..eda59ebb19a 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -596,8 +596,10 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
> die(_("options '%s' and '%s' cannot be used together"), "--long", "--abbrev=0");
>
> if (contains) {
> + struct string_list to_free = STRING_LIST_INIT_DUP;
> struct string_list_item *item;
> - struct strvec args = STRVEC_INIT;
> + struct strvec args = STRVEC_INIT_NODUP;
> +
> int ret;
>
> strvec_pushl(&args, "name-rev",
> @@ -607,10 +609,19 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
> strvec_push(&args, "--always");
> if (!all) {
> strvec_push(&args, "--tags");
> - for_each_string_list_item(item, &patterns)
> - strvec_pushf(&args, "--refs=refs/tags/%s", item->string);
> - for_each_string_list_item(item, &exclude_patterns)
> - strvec_pushf(&args, "--exclude=refs/tags/%s", item->string);
> + for_each_string_list_item(item, &patterns) {
> + char *str = xstrfmt("--refs=refs/tags/%s", item->string);
> + const char *item = string_list_append_nodup(&to_free, str)->string;
> +
> + strvec_push(&args, item);
> + }
> + for_each_string_list_item(item, &exclude_patterns) {
> + char *str = xstrfmt("--exclude=refs/tags/%s", item->string);
> +
> + const char *item = string_list_append_nodup(&to_free, str)->string;
> +
> + strvec_push(&args, item);
> + }
Having to track allocations separately for each option is tedious, as
we see here.
> }
> if (argc)
> strvec_pushv(&args, argv);
> @@ -618,6 +629,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
> strvec_push(&args, "HEAD");
> ret = cmd_name_rev(args.nr, args.v, prefix);
> strvec_clear(&args);
> + string_list_clear(&to_free, 0);
> return ret;
> }
>
> diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> index 6ef0d06ee8b..95c6cdc4be0 100644
> --- a/builtin/upload-archive.c
> +++ b/builtin/upload-archive.c
> @@ -19,7 +19,8 @@ static const char deadchild[] =
>
> int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
> {
> - struct strvec sent_argv = STRVEC_INIT;
> + struct string_list to_free = STRING_LIST_INIT_DUP;
> + struct strvec sent_argv = STRVEC_INIT_NODUP;
> const char *arg_cmd = "argument ";
> int ret;
>
> @@ -34,6 +35,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
> /* put received options in sent_argv[] */
> strvec_push(&sent_argv, "git-upload-archive");
> for (;;) {
> + struct string_list_item *item;
> char *buf = packet_read_line(0, NULL);
> if (!buf)
> break; /* got a flush */
> @@ -42,13 +44,16 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
>
> if (!starts_with(buf, arg_cmd))
> die("'argument' token or flush expected");
> - strvec_push(&sent_argv, buf + strlen(arg_cmd));
> +
> + item = string_list_append(&to_free, buf + strlen(arg_cmd));
> + strvec_push(&sent_argv, item->string);
> }
>
> /* parse all options sent by the client */
> ret = write_archive(sent_argv.nr, sent_argv.v, prefix,
> the_repository, NULL, 1);
> strvec_clear(&sent_argv);
> + string_list_clear(&to_free, 0);
> return ret;
> }
>
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index fc499cdff01..cc1ce060558 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -2,6 +2,7 @@
>
> test_description='git archive --format=zip test'
>
> +TEST_PASSES_SANITIZE_LEAK=true
> TEST_CREATE_REPO_NO_TEMPLATE=1
> . ./test-lib.sh
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
2022-12-15 9:11 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
` (4 preceding siblings ...)
2022-12-15 9:11 ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45 ` René Scharfe
2022-12-17 13:13 ` Jeff King
6 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano
Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> This is an alternative to René's [1], his already fixes a leak in "git
> am", and this could be done later, so I'm submitting it as RFC, but it
> could also replace it.
>
> I think as this series shows extending the "strvec" API to get a
> feature that works like the existing "strdup_strings" that the "struct
> string_list" has can make memory management much simpler.
>
> The 4/5 here shows cases where we were leaking because our "v" was
> clobbered, but where all the strings we were pushing to the "strvec"
> were fixed strings, so we could skip xstrdup()-ing them.
>
> The 5/5 then shows more complex cases where we have mixed-use,
> i.e. most strings are fixed, but some are not. For those we use a
> "struct string_list to_free = STRING_LIST_INIT_DUP", which we then
> push to the "to_free" list with "string_list_append_nodup()".
>
> This does make the API slightly more dangerous to use, as it's no
> longer guaranteed that it owns all the members it points to. But as
> the "struct string_list" has shown this isn't an issue in practice,
> and e.g. SANITIZE=address et al are good about finding double-frees,
> or frees of fixed strings.
>
> A branch & CI for this are found at [2].
>
> 1. https://lore.kernel.org/git/baf93e4a-7f05-857c-e551-09675496c03c@web.de/
> 2. https://github.com/avar/git/tree/avar/add-and-use-strvec-init-nodup
>
> Ævar Arnfjörð Bjarmason (5):
> builtin/annotate.c: simplify for strvec API
> various: add missing strvec_clear()
> strvec API: add a "STRVEC_INIT_NODUP"
> strvec API users: fix leaks by using "STRVEC_INIT_NODUP"
> strvec API users: fix more leaks by using "STRVEC_INIT_NODUP"
>
> builtin/am.c | 2 +-
> builtin/annotate.c | 17 ++++++++---------
> builtin/describe.c | 28 +++++++++++++++++++++-------
> builtin/stash.c | 8 ++++++--
> builtin/upload-archive.c | 16 ++++++++++++----
> strvec.c | 20 ++++++++++++++++++--
> strvec.h | 30 +++++++++++++++++++++++++++++-
> t/t0023-crlf-am.sh | 1 +
> t/t4152-am-subjects.sh | 2 ++
> t/t4254-am-corrupt.sh | 2 ++
> t/t4256-am-format-flowed.sh | 1 +
> t/t4257-am-interactive.sh | 2 ++
> t/t5003-archive-zip.sh | 1 +
> t/t5403-post-checkout-hook.sh | 1 +
> 14 files changed, 105 insertions(+), 26 deletions(-)
>
This complicates strvec to gain functionality that we basically already
have with REALLOC_ARRAY. It allows for nice conversions in some cases
(patch 4), but places that require some kind of double accounting become
quite nasty (patch 5). I'd prefer to keep strvec simple. Avoiding
allocations isn't necessary because the number of options to parse is
low -- we just need to keep track of and release them at the end.
The problem here is that parse_options() takes a string list in the form
of an int paired with a const char ** and modifies this list in place.
That's fine if we don't care about the memory ownership of the list and
its elements, e.g. because we get it from the OS (main() style) or
accept leakage.
It's not compatible with the list being a strvec that we don't want to
leak, though. What we need is something that translates from strvec to
the argc/argv convention. If we allow leaks this is trivial: Just use
the .v member directly. If we don't then it is still simple: Make a
shallow copy of the .v member, release it after use. Demo patch below.
builtin/am.c | 5 ++++-
builtin/annotate.c | 9 ++++++++-
builtin/describe.c | 8 +++++++-
builtin/stash.c | 10 +++++++++-
builtin/upload-archive.c | 11 +++++++++--
strvec.c | 9 +++++++++
strvec.h | 7 +++++++
7 files changed, 53 insertions(+), 6 deletions(-)
diff --git a/builtin/am.c b/builtin/am.c
index 30c9b3a9cd..a80d40f4be 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1476,14 +1476,16 @@ static int run_apply(const struct am_state *state, const char *index_file)
int res, opts_left;
int force_apply = 0;
int options = 0;
+ const char **apply_argv;
if (init_apply_state(&apply_state, the_repository, NULL))
BUG("init_apply_state() failed");
strvec_push(&apply_opts, "apply");
strvec_pushv(&apply_opts, state->git_apply_opts.v);
+ apply_argv = strvec_clone_nodup(&apply_opts);
- opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
+ opts_left = apply_parse_options(apply_opts.nr, apply_argv,
&apply_state, &force_apply, &options,
NULL);
@@ -1513,6 +1515,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
strvec_clear(&apply_paths);
strvec_clear(&apply_opts);
clear_apply_state(&apply_state);
+ free(apply_argv);
if (res)
return res;
diff --git a/builtin/annotate.c b/builtin/annotate.c
index 58ff977a23..7e75f0f48d 100644
--- a/builtin/annotate.c
+++ b/builtin/annotate.c
@@ -10,6 +10,8 @@
int cmd_annotate(int argc, const char **argv, const char *prefix)
{
struct strvec args = STRVEC_INIT;
+ const char **blame_argv;
+ int ret;
int i;
strvec_pushl(&args, "annotate", "-c", NULL);
@@ -17,6 +19,11 @@ int cmd_annotate(int argc, const char **argv, const char *prefix)
for (i = 1; i < argc; i++) {
strvec_push(&args, argv[i]);
}
+ blame_argv = strvec_clone_nodup(&args);
- return cmd_blame(args.nr, args.v, prefix);
+ ret = cmd_blame(args.nr, blame_argv, prefix);
+
+ strvec_clear(&args);
+ free(blame_argv);
+ return ret;
}
diff --git a/builtin/describe.c b/builtin/describe.c
index eea1e330c0..2ef2fbc0d4 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -598,6 +598,8 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
if (contains) {
struct string_list_item *item;
struct strvec args;
+ const char **name_rev_argv;
+ int ret;
strvec_init(&args);
strvec_pushl(&args, "name-rev",
@@ -616,7 +618,11 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
strvec_pushv(&args, argv);
else
strvec_push(&args, "HEAD");
- return cmd_name_rev(args.nr, args.v, prefix);
+ name_rev_argv = strvec_clone_nodup(&args);
+ ret = cmd_name_rev(args.nr, name_rev_argv, prefix);
+ strvec_clear(&args);
+ free(name_rev_argv);
+ return ret;
}
hashmap_init(&names, commit_name_neq, NULL, 0);
diff --git a/builtin/stash.c b/builtin/stash.c
index bb0fd86143..864b7c573a 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1820,6 +1820,8 @@ static int save_stash(int argc, const char **argv, const char *prefix)
int cmd_stash(int argc, const char **argv, const char *prefix)
{
+ int ret;
+ const char **push_stash_argv;
pid_t pid = getpid();
const char *index_file;
struct strvec args = STRVEC_INIT;
@@ -1861,5 +1863,11 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
/* Assume 'stash push' */
strvec_push(&args, "push");
strvec_pushv(&args, argv);
- return !!push_stash(args.nr, args.v, prefix, 1);
+ push_stash_argv = strvec_clone_nodup(&args);
+
+ ret = !!push_stash(args.nr, push_stash_argv, prefix, 1);
+
+ strvec_clear(&args);
+ free(push_stash_argv);
+ return ret;
}
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 945ee2b412..36ed85e10a 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -19,6 +19,8 @@ static const char deadchild[] =
int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
{
+ int ret;
+ const char **upload_archive_argv;
struct strvec sent_argv = STRVEC_INIT;
const char *arg_cmd = "argument ";
@@ -43,10 +45,15 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
die("'argument' token or flush expected");
strvec_push(&sent_argv, buf + strlen(arg_cmd));
}
+ upload_archive_argv = strvec_clone_nodup(&sent_argv);
/* parse all options sent by the client */
- return write_archive(sent_argv.nr, sent_argv.v, prefix,
- the_repository, NULL, 1);
+ ret = write_archive(sent_argv.nr, upload_archive_argv, prefix,
+ the_repository, NULL, 1);
+
+ strvec_clear(&sent_argv);
+ free(upload_archive_argv);
+ return ret;
}
__attribute__((format (printf, 1, 2)))
diff --git a/strvec.c b/strvec.c
index 61a76ce6cb..5c41c8c506 100644
--- a/strvec.c
+++ b/strvec.c
@@ -106,3 +106,12 @@ const char **strvec_detach(struct strvec *array)
return ret;
}
}
+
+const char **strvec_clone_nodup(const struct strvec *sv)
+{
+ const char **ret;
+
+ CALLOC_ARRAY(ret, sv->nr + 1);
+ COPY_ARRAY(ret, sv->v, sv->nr);
+ return ret;
+}
diff --git a/strvec.h b/strvec.h
index 9f55c8766b..6234634b00 100644
--- a/strvec.h
+++ b/strvec.h
@@ -88,4 +88,11 @@ void strvec_clear(struct strvec *);
*/
const char **strvec_detach(struct strvec *);
+/**
+ * Create a copy of the array that references the original strings.
+ * The caller is responsible for freeing the memory of that copy,
+ * but must not free the individual strings.
+ */
+const char **strvec_clone_nodup(const struct strvec *);
+
#endif /* STRVEC_H */
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
2022-12-15 9:11 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
` (5 preceding siblings ...)
2022-12-17 12:45 ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks René Scharfe
@ 2022-12-17 13:13 ` Jeff King
2022-12-19 9:20 ` Ævar Arnfjörð Bjarmason
6 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2022-12-17 13:13 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, René Scharfe, Junio C Hamano
On Thu, Dec 15, 2022 at 10:11:06AM +0100, Ævar Arnfjörð Bjarmason wrote:
> This is an alternative to René's [1], his already fixes a leak in "git
> am", and this could be done later, so I'm submitting it as RFC, but it
> could also replace it.
>
> I think as this series shows extending the "strvec" API to get a
> feature that works like the existing "strdup_strings" that the "struct
> string_list" has can make memory management much simpler.
I know this is kind of a surface level review, but...please don't do
this. We have chased so many bugs over the years due to string-list's
"maybe this is allocated and maybe not", in both directions (accidental
leaks and double-frees).
One of the reasons I advocated for strvec in the first place is so that
it would have consistent memory management semantics, at the minor cost
of sometimes duplicating them when we don't need to.
And having a nodup form doesn't even save you from having to call
strvec_clear(); you still need to do so to avoid leaking the array
itself. It only helps in the weird parse-options case, where we don't
handle ownership of the array very well (the strvec owns it, but
parse-options wants to modify it).
> This does make the API slightly more dangerous to use, as it's no
> longer guaranteed that it owns all the members it points to. But as
> the "struct string_list" has shown this isn't an issue in practice,
> and e.g. SANITIZE=address et al are good about finding double-frees,
> or frees of fixed strings.
I would disagree that this hasn't been an issue in practice. A few
recent examples:
- 5eeb9aa208 (refs: fix memory leak when parsing hideRefs config,
2022-11-17)
- 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
strings, 2022-09-08)
- 4c81ee9669 (submodule--helper: fix "reference" leak, 2022-09-01)
Now you could argue that those leaks might still exist if we only had a
duplicating version of string-list (after all, the problem in a leak is
an extra duplication). But IMHO it is the ambiguity and the games we
play with setting/unsetting the strdup_strings field that lead to these
errors.
And yes, leak-checking and sanitizers can sometimes find these bugs. But
that implies triggering the bug in the test suite. And it implies extra
time to track and fix them. An interface which is harder to get wrong in
the first place is preferable.
-Peff
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
2022-12-17 13:13 ` Jeff King
@ 2022-12-19 9:20 ` Ævar Arnfjörð Bjarmason
2023-01-07 13:21 ` Jeff King
0 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-19 9:20 UTC (permalink / raw)
To: Jeff King; +Cc: git, René Scharfe, Junio C Hamano
On Sat, Dec 17 2022, Jeff King wrote:
> On Thu, Dec 15, 2022 at 10:11:06AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This is an alternative to René's [1], his already fixes a leak in "git
>> am", and this could be done later, so I'm submitting it as RFC, but it
>> could also replace it.
>>
>> I think as this series shows extending the "strvec" API to get a
>> feature that works like the existing "strdup_strings" that the "struct
>> string_list" has can make memory management much simpler.
>
> I know this is kind of a surface level review, but...please don't do
> this. We have chased so many bugs over the years due to string-list's
> "maybe this is allocated and maybe not", in both directions (accidental
> leaks and double-frees).
>
> One of the reasons I advocated for strvec in the first place is so that
> it would have consistent memory management semantics, at the minor cost
> of sometimes duplicating them when we don't need to.
>
> And having a nodup form doesn't even save you from having to call
> strvec_clear(); you still need to do so to avoid leaking the array
> itself. It only helps in the weird parse-options case, where we don't
> handle ownership of the array very well (the strvec owns it, but
> parse-options wants to modify it).
Yes, just like "struct string_list" in the "nodup" mode.
I hear you, but I also think you're implicitly conflating two things
here.
There's the question of whether we should in general optimize for safety
over more optimila memory use. I.e. if we simply have every strvec,
string_list etc. own its memory fully we don't need to think as much
about allocation or ownership.
I think we should do that in general, but we also have cases where we'd
like to not do that, e.g. where we're adding thousands of strings to a
string_list, which are all borrewed from elsewhere, except for a few
we'd like to xstrdup().
Such API use *is* tricky, but I think formalizing it as the
"string_list" does is making it better, not worse. In particular...:
>> This does make the API slightly more dangerous to use, as it's no
>> longer guaranteed that it owns all the members it points to. But as
>> the "struct string_list" has shown this isn't an issue in practice,
>> and e.g. SANITIZE=address et al are good about finding double-frees,
>> or frees of fixed strings.
>
> I would disagree that this hasn't been an issue in practice. A few
> recent examples:
>
> - 5eeb9aa208 (refs: fix memory leak when parsing hideRefs config,
> 2022-11-17)
> - 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
> strings, 2022-09-08)
> - 4c81ee9669 (submodule--helper: fix "reference" leak, 2022-09-01)
...it's funny that those are the examples I would have dug up to argue
that this is a good idea, and to add some:
- 4a0479086a9 (commit-graph: fix memory leak in misused
string_list API, 2022-03-04)
- b202e51b154 (grep: fix a "path_list" memory leak, 2021-10-22)
I.e. above you note "in both directions [...] leaks [...] and double
frees", but these (and the ones I added) are all in the second category.
Which is why I don't think it's an issue in practice. The leaks have
been a non-issue, and to the extent that we care the SANITIZE=leak
testing is closing that gap.
The dangerous issue is the double-free, but (and over the years I have
looked at pretty much every caller) I can't imagine a string_list
use-case where we:
a) Actually still want to keep that memory optimization, i.e. it
wouldn't be better by just saying "screw it, let's dup it".
b) Given "a", we'd be better off with some bespoke custom pattern over
the facility to do this with the "string_list".
So I really think we're in agreement about 99% of this, I just don't see
how *if* we want to do this why we're better of re-inventing this
particular wheel for every caller.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
2022-12-19 9:20 ` Ævar Arnfjörð Bjarmason
@ 2023-01-07 13:21 ` Jeff King
0 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2023-01-07 13:21 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, René Scharfe, Junio C Hamano
On Mon, Dec 19, 2022 at 10:20:00AM +0100, Ævar Arnfjörð Bjarmason wrote:
> I hear you, but I also think you're implicitly conflating two things
> here.
>
> There's the question of whether we should in general optimize for safety
> over more optimila memory use. I.e. if we simply have every strvec,
> string_list etc. own its memory fully we don't need to think as much
> about allocation or ownership.
>
> I think we should do that in general, but we also have cases where we'd
> like to not do that, e.g. where we're adding thousands of strings to a
> string_list, which are all borrewed from elsewhere, except for a few
> we'd like to xstrdup().
>
> Such API use *is* tricky, but I think formalizing it as the
> "string_list" does is making it better, not worse. In particular...:
I think the two things are related, though, because "safety" is "people
not mis-using the API". Once you give the API the option to do the
trickier thing, people will do it, and they will do it badly. That has
been my experience with the string-list API (especially that people try
to do clever things with transferring ownership by using a
non-duplicating list and then setting strdup_strings midway through the
function).
> > I would disagree that this hasn't been an issue in practice. A few
> > recent examples:
> >
> > - 5eeb9aa208 (refs: fix memory leak when parsing hideRefs config,
> > 2022-11-17)
> > - 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
> > strings, 2022-09-08)
> > - 4c81ee9669 (submodule--helper: fix "reference" leak, 2022-09-01)
>
> ...it's funny that those are the examples I would have dug up to argue
> that this is a good idea, and to add some:
>
> - 4a0479086a9 (commit-graph: fix memory leak in misused
> string_list API, 2022-03-04)
> - b202e51b154 (grep: fix a "path_list" memory leak, 2021-10-22)
>
> I.e. above you note "in both directions [...] leaks [...] and double
> frees", but these (and the ones I added) are all in the second category.
I almost didn't give a list, because it's hard to dig into all of the
details. For example the second one in my list, which I worked on, did
fix things by consistently setting strdup_strings, and it was "just" a
leak fix. But it took me a full day to untangle the mess of that code,
and there were intermediate states were we did access dangling pointers
before I got to a working order of the series. And all of it was
complicated by the fact that string_list is configurable, so different
bits of the code expected different things. Even after that commit,
there's a horrible hack to set the strdup field and hope it's enough.
If that were the only time I'd wrestled with string_list, I'd be less
concerned. But (subjectively) this feels like the latest in a long line
of bugs and irritations.
-Peff
^ permalink raw reply [flat|nested] 24+ messages in thread