git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] getenv() timing fixes
@ 2019-01-11 22:14 Jeff King
  2019-01-11 22:15 ` [PATCH 1/6] get_super_prefix(): copy getenv() result Jeff King
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Jeff King @ 2019-01-11 22:14 UTC (permalink / raw)
  To: git

Similar to the recent:

  https://public-inbox.org/git/20190109221007.21624-1-kgybels@infogroep.be/

there are some other places where we do not follow the POSIX rule that
getenv()'s return value may be invalidated by other calls to getenv() or
setenv().

For the most part we haven't noticed because:

  - on many platforms, you can call getenv() as many times as you want.
    This changed recently in our mingw_getenv() helper, which is why
    people are noticing now.

  - calling setenv() in between _often_ works, but it depends on whether
    libc feels like it needs to reallocate memory. Which is itself
    platform specific, and even on a single platform may depend on
    things like how many environment variables you have set.

The first patch here is a problem somebody actually found in the wild.
That led me to start looking through the results of:

  git grep '= getenv('

There are a ton of hits. I poked at the first 20 or so. A lot of them
are fine, as they do something like this:

  rla = getenv("GIT_REFLOG_ACTION");
  strbuf_addstr("blah blah %s", rla);

That's not _strictly_ correct, because strbuf_addstr() may actually look
at the environment. But it works for our mingw_getenv() case, because
there we use a rotating series of buffers. So as long as it doesn't look at
30 environment variables, we're fine. And many calls fall into that
bucket (a more complicated one is get_ssh_command(), which runs a fair
bit of code while holding the pointer, but ultimately probably has a
small fixed number of opportunities to call getenv(). What is more
worrisome is code that holds a pointer across an arbitrary number of
calls (like once per diff'd file, or once per submodule, etc).

Of course it's possible for some platform libc to use a single buffer.
But in that case, I'd argue that the path of least resistance is
wrapping getenv, like I described in:

  https://public-inbox.org/git/20181025062037.GC11460@sigill.intra.peff.net/

So anyway. Here are a handful of what seem like pretty low-hanging
fruit. Beyond the first one, I'm not sure if they're triggerable, but
they're easy to fix. There are 100+ grep matches that I _didn't_ audit,
so this is by no means a complete fix. I was mostly trying to get a
sense of how painful these fixes would be.

  [1/6]: get_super_prefix(): copy getenv() result
  [2/6]: commit: copy saved getenv() result
  [3/6]: config: make a copy of $GIT_CONFIG string
  [4/6]: init: make a copy of $GIT_DIR string
  [5/6]: merge-recursive: copy $GITHEAD strings
  [6/6]: builtin_diff(): read $GIT_DIFF_OPTS closer to use

 builtin/commit.c          |  3 ++-
 builtin/config.c          |  2 +-
 builtin/init-db.c         |  6 ++++--
 builtin/merge-recursive.c | 15 ++++++++++-----
 diff.c                    |  5 ++++-
 environment.c             |  4 ++--
 6 files changed, 23 insertions(+), 12 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] get_super_prefix(): copy getenv() result
  2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
@ 2019-01-11 22:15 ` Jeff King
  2019-01-12  3:02   ` Junio C Hamano
  2019-01-11 22:15 ` [PATCH 2/6] commit: copy saved " Jeff King
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-11 22:15 UTC (permalink / raw)
  To: git

The return value of getenv() is not guaranteed to remain valid across
multiple calls (nor across calls to setenv()). Since this function
caches the result for the length of the program, we must make a copy to
ensure that it is still valid when we need it.

Reported-by: Yngve N. Pettersen <yngve@vivaldi.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 environment.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/environment.c b/environment.c
index 0e37741d83..89af47cb85 100644
--- a/environment.c
+++ b/environment.c
@@ -107,7 +107,7 @@ char *git_work_tree_cfg;
 
 static char *git_namespace;
 
-static const char *super_prefix;
+static char *super_prefix;
 
 /*
  * Repository-local GIT_* environment variables; see cache.h for details.
@@ -240,7 +240,7 @@ const char *get_super_prefix(void)
 {
 	static int initialized;
 	if (!initialized) {
-		super_prefix = getenv(GIT_SUPER_PREFIX_ENVIRONMENT);
+		super_prefix = xstrdup_or_null(getenv(GIT_SUPER_PREFIX_ENVIRONMENT));
 		initialized = 1;
 	}
 	return super_prefix;
-- 
2.20.1.651.g2d41a78c67


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] commit: copy saved getenv() result
  2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
  2019-01-11 22:15 ` [PATCH 1/6] get_super_prefix(): copy getenv() result Jeff King
@ 2019-01-11 22:15 ` Jeff King
  2019-01-12  3:07   ` Junio C Hamano
  2019-01-11 22:15 ` [PATCH 3/6] config: make a copy of $GIT_CONFIG string Jeff King
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-11 22:15 UTC (permalink / raw)
  To: git

We save the result of $GIT_INDEX_FILE so that we can restore it after
setting it to a new value and running add--interactive. However, the
pointer returned by getenv() is not guaranteed to be valid after calling
setenv(). This _usually_ works fine, but can fail if libc needs to
reallocate the environment block during the setenv().

Let's just duplicate the string, so we know that it remains valid.

In the long run it may be more robust to teach interactive_add() to take
a set of environment variables to pass along to run-command when it
execs add--interactive. And then we would not have to do this
save/restore dance at all. But this is an easy fix in the meantime.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 004b816635..7d2e0b61e5 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -351,7 +351,7 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
 		if (write_locked_index(&the_index, &index_lock, 0))
 			die(_("unable to create temporary index"));
 
-		old_index_env = getenv(INDEX_ENVIRONMENT);
+		old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT));
 		setenv(INDEX_ENVIRONMENT, get_lock_file_path(&index_lock), 1);
 
 		if (interactive_add(argc, argv, prefix, patch_interactive) != 0)
@@ -361,6 +361,7 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
 			setenv(INDEX_ENVIRONMENT, old_index_env, 1);
 		else
 			unsetenv(INDEX_ENVIRONMENT);
+		FREE_AND_NULL(old_index_env);
 
 		discard_cache();
 		read_cache_from(get_lock_file_path(&index_lock));
-- 
2.20.1.651.g2d41a78c67


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] config: make a copy of $GIT_CONFIG string
  2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
  2019-01-11 22:15 ` [PATCH 1/6] get_super_prefix(): copy getenv() result Jeff King
  2019-01-11 22:15 ` [PATCH 2/6] commit: copy saved " Jeff King
@ 2019-01-11 22:15 ` Jeff King
  2019-01-11 22:16 ` [PATCH 4/6] init: make a copy of $GIT_DIR string Jeff King
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2019-01-11 22:15 UTC (permalink / raw)
  To: git

cmd_config() points our source filename pointer at the return value of
getenv(), but that value may be invalidated by further calls to
environment functions. Let's copy it to make sure it remains valid.

We don't need to bother freeing it, as it remains part of the
whole-process global state until we exit.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/config.c b/builtin/config.c
index 84385ef165..2db4e763e7 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -598,7 +598,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
 	int nongit = !startup_info->have_repository;
 	char *value;
 
-	given_config_source.file = getenv(CONFIG_ENVIRONMENT);
+	given_config_source.file = xstrdup_or_null(getenv(CONFIG_ENVIRONMENT));
 
 	argc = parse_options(argc, argv, prefix, builtin_config_options,
 			     builtin_config_usage,
-- 
2.20.1.651.g2d41a78c67


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] init: make a copy of $GIT_DIR string
  2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
                   ` (2 preceding siblings ...)
  2019-01-11 22:15 ` [PATCH 3/6] config: make a copy of $GIT_CONFIG string Jeff King
@ 2019-01-11 22:16 ` Jeff King
  2019-01-12  3:08   ` Junio C Hamano
  2019-01-11 22:16 ` [PATCH 5/6] merge-recursive: copy $GITHEAD strings Jeff King
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-11 22:16 UTC (permalink / raw)
  To: git

We pass the result of getenv("GIT_DIR") to init_db() and assume that the
string remains valid. But that's not guaranteed across calls to setenv()
or even getenv(), although it often works in practice. Let's make a copy
of the string so that we follow the rules.

Note that we need to mark it with UNLEAK(), since the value persists
until the end of program (but we have no opportunity to free it).

This patch also handles $GIT_WORK_TREE the same way. It actually doesn't
have as long a lifetime and is probably fine, but it's simpler to just
treat the two side-by-side variables the same.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/init-db.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 41faffd28d..93eff7618c 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -542,8 +542,8 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	 * GIT_WORK_TREE makes sense only in conjunction with GIT_DIR
 	 * without --bare.  Catch the error early.
 	 */
-	git_dir = getenv(GIT_DIR_ENVIRONMENT);
-	work_tree = getenv(GIT_WORK_TREE_ENVIRONMENT);
+	git_dir = xstrdup_or_null(getenv(GIT_DIR_ENVIRONMENT));
+	work_tree = xstrdup_or_null(getenv(GIT_WORK_TREE_ENVIRONMENT));
 	if ((!git_dir || is_bare_repository_cfg == 1) && work_tree)
 		die(_("%s (or --work-tree=<directory>) not allowed without "
 			  "specifying %s (or --git-dir=<directory>)"),
@@ -582,6 +582,8 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(git_dir);
+	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
 	return init_db(git_dir, real_git_dir, template_dir, flags);
-- 
2.20.1.651.g2d41a78c67


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] merge-recursive: copy $GITHEAD strings
  2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
                   ` (3 preceding siblings ...)
  2019-01-11 22:16 ` [PATCH 4/6] init: make a copy of $GIT_DIR string Jeff King
@ 2019-01-11 22:16 ` Jeff King
  2019-01-12  3:10   ` Junio C Hamano
  2019-01-11 22:17 ` [PATCH 6/6] builtin_diff(): read $GIT_DIFF_OPTS closer to use Jeff King
  2019-01-12 11:31 ` [PATCH 0/6] getenv() timing fixes Ævar Arnfjörð Bjarmason
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-11 22:16 UTC (permalink / raw)
  To: git

If $GITHEAD_1234abcd is set in the environment, we use its value as a
"better branch name" in generating conflict markers. However, we pick
these better names early in the process, and the return value from
getenv() is not guaranteed to stay valid.

Let's make a copy of the returned string. And to make memory management
easier, let's just always return an allocated string from
better_branch_name(), so we know that it must always be freed.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/merge-recursive.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/builtin/merge-recursive.c b/builtin/merge-recursive.c
index 9b2f707c29..7545136c2a 100644
--- a/builtin/merge-recursive.c
+++ b/builtin/merge-recursive.c
@@ -7,16 +7,16 @@
 static const char builtin_merge_recursive_usage[] =
 	"git %s <base>... -- <head> <remote> ...";
 
-static const char *better_branch_name(const char *branch)
+static char *better_branch_name(const char *branch)
 {
 	static char githead_env[8 + GIT_MAX_HEXSZ + 1];
 	char *name;
 
 	if (strlen(branch) != the_hash_algo->hexsz)
-		return branch;
+		return xstrdup(branch);
 	xsnprintf(githead_env, sizeof(githead_env), "GITHEAD_%s", branch);
 	name = getenv(githead_env);
-	return name ? name : branch;
+	return xstrdup(name ? name : branch);
 }
 
 int cmd_merge_recursive(int argc, const char **argv, const char *prefix)
@@ -26,6 +26,7 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix)
 	int i, failed;
 	struct object_id h1, h2;
 	struct merge_options o;
+	char *better1, *better2;
 	struct commit *result;
 
 	init_merge_options(&o);
@@ -70,13 +71,17 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix)
 	if (get_oid(o.branch2, &h2))
 		die(_("could not resolve ref '%s'"), o.branch2);
 
-	o.branch1 = better_branch_name(o.branch1);
-	o.branch2 = better_branch_name(o.branch2);
+	o.branch1 = better1 = better_branch_name(o.branch1);
+	o.branch2 = better2 = better_branch_name(o.branch2);
 
 	if (o.verbosity >= 3)
 		printf(_("Merging %s with %s\n"), o.branch1, o.branch2);
 
 	failed = merge_recursive_generic(&o, &h1, &h2, bases_count, bases, &result);
+
+	free(better1);
+	free(better2);
+
 	if (failed < 0)
 		return 128; /* die() error code */
 	return failed;
-- 
2.20.1.651.g2d41a78c67


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] builtin_diff(): read $GIT_DIFF_OPTS closer to use
  2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
                   ` (4 preceding siblings ...)
  2019-01-11 22:16 ` [PATCH 5/6] merge-recursive: copy $GITHEAD strings Jeff King
@ 2019-01-11 22:17 ` Jeff King
  2019-01-12 11:31 ` [PATCH 0/6] getenv() timing fixes Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2019-01-11 22:17 UTC (permalink / raw)
  To: git

The value returned by getenv() is not guaranteed to remain valid across
other environment function calls. But in between our call and using the
value, we run fill_textconv(), which may do quite a bit of work,
including spawning sub-processes.

We can make this safer by calling getenv() right before we actually look
at its value.

Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 15556c190d..6751ec29f0 100644
--- a/diff.c
+++ b/diff.c
@@ -3476,7 +3476,7 @@ static void builtin_diff(const char *name_a,
 		o->found_changes = 1;
 	} else {
 		/* Crazy xdl interfaces.. */
-		const char *diffopts = getenv("GIT_DIFF_OPTS");
+		const char *diffopts;
 		const char *v;
 		xpparam_t xpp;
 		xdemitconf_t xecfg;
@@ -3519,12 +3519,15 @@ static void builtin_diff(const char *name_a,
 			xecfg.flags |= XDL_EMIT_FUNCCONTEXT;
 		if (pe)
 			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
+
+		diffopts = getenv("GIT_DIFF_OPTS");
 		if (!diffopts)
 			;
 		else if (skip_prefix(diffopts, "--unified=", &v))
 			xecfg.ctxlen = strtoul(v, NULL, 10);
 		else if (skip_prefix(diffopts, "-u", &v))
 			xecfg.ctxlen = strtoul(v, NULL, 10);
+
 		if (o->word_diff)
 			init_diff_words_data(&ecbdata, o, one, two);
 		if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
-- 
2.20.1.651.g2d41a78c67

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] get_super_prefix(): copy getenv() result
  2019-01-11 22:15 ` [PATCH 1/6] get_super_prefix(): copy getenv() result Jeff King
@ 2019-01-12  3:02   ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2019-01-12  3:02 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> The return value of getenv() is not guaranteed to remain valid across
> multiple calls (nor across calls to setenv()). Since this function
> caches the result for the length of the program, we must make a copy to
> ensure that it is still valid when we need it.
>
> Reported-by: Yngve N. Pettersen <yngve@vivaldi.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Makes sense.  Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] commit: copy saved getenv() result
  2019-01-11 22:15 ` [PATCH 2/6] commit: copy saved " Jeff King
@ 2019-01-12  3:07   ` Junio C Hamano
  2019-01-12 10:26     ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2019-01-12  3:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> We save the result of $GIT_INDEX_FILE so that we can restore it after
> setting it to a new value and running add--interactive. However, the
> pointer returned by getenv() is not guaranteed to be valid after calling
> setenv(). This _usually_ works fine, but can fail if libc needs to
> reallocate the environment block during the setenv().
>
> Let's just duplicate the string, so we know that it remains valid.
>
> In the long run it may be more robust to teach interactive_add() to take
> a set of environment variables to pass along to run-command when it
> execs add--interactive. And then we would not have to do this
> save/restore dance at all. But this is an easy fix in the meantime.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/commit.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 004b816635..7d2e0b61e5 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -351,7 +351,7 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
>  		if (write_locked_index(&the_index, &index_lock, 0))
>  			die(_("unable to create temporary index"));
>  
> -		old_index_env = getenv(INDEX_ENVIRONMENT);
> +		old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT));
>  		setenv(INDEX_ENVIRONMENT, get_lock_file_path(&index_lock), 1);
>  
>  		if (interactive_add(argc, argv, prefix, patch_interactive) != 0)
> @@ -361,6 +361,7 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
>  			setenv(INDEX_ENVIRONMENT, old_index_env, 1);
>  		else
>  			unsetenv(INDEX_ENVIRONMENT);
> +		FREE_AND_NULL(old_index_env);
>  
>  		discard_cache();
>  		read_cache_from(get_lock_file_path(&index_lock));

Even though it is not wrong per-se to assign a NULL to the
now-no-longer-referenced variable, I do not quite get why it is
free-and-null, not a straight free.  This may be a taste-thing,
though.

Even if a future update needs to make it possible to access
old_index_env somewhere in the block after discard_cache() gets
called, we would need to push down the free (or free-and-null) to
prolong its lifetime a bit anyway, so...



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] init: make a copy of $GIT_DIR string
  2019-01-11 22:16 ` [PATCH 4/6] init: make a copy of $GIT_DIR string Jeff King
@ 2019-01-12  3:08   ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2019-01-12  3:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> We pass the result of getenv("GIT_DIR") to init_db() and assume that the
> string remains valid. But that's not guaranteed across calls to setenv()
> or even getenv(), although it often works in practice. Let's make a copy
> of the string so that we follow the rules.
>
> Note that we need to mark it with UNLEAK(), since the value persists
> until the end of program (but we have no opportunity to free it).

Makes sense.  Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] merge-recursive: copy $GITHEAD strings
  2019-01-11 22:16 ` [PATCH 5/6] merge-recursive: copy $GITHEAD strings Jeff King
@ 2019-01-12  3:10   ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2019-01-12  3:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> If $GITHEAD_1234abcd is set in the environment, we use its value as a
> "better branch name" in generating conflict markers. However, we pick
> these better names early in the process, and the return value from
> getenv() is not guaranteed to stay valid.
>
> Let's make a copy of the returned string. And to make memory management
> easier, let's just always return an allocated string from
> better_branch_name(), so we know that it must always be freed.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/merge-recursive.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)

Thanks.  Will queue.

>
> diff --git a/builtin/merge-recursive.c b/builtin/merge-recursive.c
> index 9b2f707c29..7545136c2a 100644
> --- a/builtin/merge-recursive.c
> +++ b/builtin/merge-recursive.c
> @@ -7,16 +7,16 @@
>  static const char builtin_merge_recursive_usage[] =
>  	"git %s <base>... -- <head> <remote> ...";
>  
> -static const char *better_branch_name(const char *branch)
> +static char *better_branch_name(const char *branch)
>  {
>  	static char githead_env[8 + GIT_MAX_HEXSZ + 1];
>  	char *name;
>  
>  	if (strlen(branch) != the_hash_algo->hexsz)
> -		return branch;
> +		return xstrdup(branch);
>  	xsnprintf(githead_env, sizeof(githead_env), "GITHEAD_%s", branch);
>  	name = getenv(githead_env);
> -	return name ? name : branch;
> +	return xstrdup(name ? name : branch);
>  }
>  
>  int cmd_merge_recursive(int argc, const char **argv, const char *prefix)
> @@ -26,6 +26,7 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix)
>  	int i, failed;
>  	struct object_id h1, h2;
>  	struct merge_options o;
> +	char *better1, *better2;
>  	struct commit *result;
>  
>  	init_merge_options(&o);
> @@ -70,13 +71,17 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix)
>  	if (get_oid(o.branch2, &h2))
>  		die(_("could not resolve ref '%s'"), o.branch2);
>  
> -	o.branch1 = better_branch_name(o.branch1);
> -	o.branch2 = better_branch_name(o.branch2);
> +	o.branch1 = better1 = better_branch_name(o.branch1);
> +	o.branch2 = better2 = better_branch_name(o.branch2);
>  
>  	if (o.verbosity >= 3)
>  		printf(_("Merging %s with %s\n"), o.branch1, o.branch2);
>  
>  	failed = merge_recursive_generic(&o, &h1, &h2, bases_count, bases, &result);
> +
> +	free(better1);
> +	free(better2);
> +
>  	if (failed < 0)
>  		return 128; /* die() error code */
>  	return failed;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] commit: copy saved getenv() result
  2019-01-12  3:07   ` Junio C Hamano
@ 2019-01-12 10:26     ` Jeff King
  2019-01-15 14:05       ` Johannes Schindelin
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-12 10:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Jan 11, 2019 at 07:07:15PM -0800, Junio C Hamano wrote:

> > diff --git a/builtin/commit.c b/builtin/commit.c
> > index 004b816635..7d2e0b61e5 100644
> > --- a/builtin/commit.c
> > +++ b/builtin/commit.c
> > @@ -351,7 +351,7 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
> >  		if (write_locked_index(&the_index, &index_lock, 0))
> >  			die(_("unable to create temporary index"));
> >  
> > -		old_index_env = getenv(INDEX_ENVIRONMENT);
> > +		old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT));
> >  		setenv(INDEX_ENVIRONMENT, get_lock_file_path(&index_lock), 1);
> >  
> >  		if (interactive_add(argc, argv, prefix, patch_interactive) != 0)
> > @@ -361,6 +361,7 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
> >  			setenv(INDEX_ENVIRONMENT, old_index_env, 1);
> >  		else
> >  			unsetenv(INDEX_ENVIRONMENT);
> > +		FREE_AND_NULL(old_index_env);
> >  
> >  		discard_cache();
> >  		read_cache_from(get_lock_file_path(&index_lock));
> 
> Even though it is not wrong per-se to assign a NULL to the
> now-no-longer-referenced variable, I do not quite get why it is
> free-and-null, not a straight free.  This may be a taste-thing,
> though.
> 
> Even if a future update needs to make it possible to access
> old_index_env somewhere in the block after discard_cache() gets
> called, we would need to push down the free (or free-and-null) to
> prolong its lifetime a bit anyway, so...

My thinking was that if we simply call free(), then the variable is left
as a dangling pointer for the rest of the function, making it easy to
accidentally use-after-free.

But certainly it would not be the first such instance in our code base.
In theory a static analyzer should easily be able to figure out such a
problem, too, so maybe it is not worth being defensive about.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] getenv() timing fixes
  2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
                   ` (5 preceding siblings ...)
  2019-01-11 22:17 ` [PATCH 6/6] builtin_diff(): read $GIT_DIFF_OPTS closer to use Jeff King
@ 2019-01-12 11:31 ` Ævar Arnfjörð Bjarmason
  2019-01-12 18:51   ` Stefan Beller
  2019-01-15 19:12   ` Jeff King
  6 siblings, 2 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-01-12 11:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git


On Fri, Jan 11 2019, Jeff King wrote:

> Similar to the recent:
>
>   https://public-inbox.org/git/20190109221007.21624-1-kgybels@infogroep.be/
>
> there are some other places where we do not follow the POSIX rule that
> getenv()'s return value may be invalidated by other calls to getenv() or
> setenv().
>
> For the most part we haven't noticed because:
>
>   - on many platforms, you can call getenv() as many times as you want.
>     This changed recently in our mingw_getenv() helper, which is why
>     people are noticing now.
>
>   - calling setenv() in between _often_ works, but it depends on whether
>     libc feels like it needs to reallocate memory. Which is itself
>     platform specific, and even on a single platform may depend on
>     things like how many environment variables you have set.
>
> The first patch here is a problem somebody actually found in the wild.
> That led me to start looking through the results of:
>
>   git grep '= getenv('
>
> There are a ton of hits. I poked at the first 20 or so. A lot of them
> are fine, as they do something like this:
>
>   rla = getenv("GIT_REFLOG_ACTION");
>   strbuf_addstr("blah blah %s", rla);
>
> That's not _strictly_ correct, because strbuf_addstr() may actually look
> at the environment. But it works for our mingw_getenv() case, because
> there we use a rotating series of buffers. So as long as it doesn't look at
> 30 environment variables, we're fine. And many calls fall into that
> bucket (a more complicated one is get_ssh_command(), which runs a fair
> bit of code while holding the pointer, but ultimately probably has a
> small fixed number of opportunities to call getenv(). What is more
> worrisome is code that holds a pointer across an arbitrary number of
> calls (like once per diff'd file, or once per submodule, etc).
>
> Of course it's possible for some platform libc to use a single buffer.
> But in that case, I'd argue that the path of least resistance is
> wrapping getenv, like I described in:
>
>   https://public-inbox.org/git/20181025062037.GC11460@sigill.intra.peff.net/
>
> So anyway. Here are a handful of what seem like pretty low-hanging
> fruit. Beyond the first one, I'm not sure if they're triggerable, but
> they're easy to fix. There are 100+ grep matches that I _didn't_ audit,
> so this is by no means a complete fix. I was mostly trying to get a
> sense of how painful these fixes would be.

I wonder, and not as "you should do this" feedback on this series, just
on future development, whether we shouldn't just make our own getenv()
wrapper for the majority of the GIT_* variables. The semantics would be
fetch value X, and if it's ever requested again return the value we
found the first time.

For some things we rely on getenv(X) -> setenv(X) -> getenv(X) returning
different values of X, e.g. in passing along config, but for
e.g. GIT_TEST_* variables we just want to check them once, and have our
own ad-hoc caches (via static variables) in a couple of places.

Maybe such an API would just loop over "environ" on startup, looking for
any GIT_* variables, i.e. called from the setup.c code.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] getenv() timing fixes
  2019-01-12 11:31 ` [PATCH 0/6] getenv() timing fixes Ævar Arnfjörð Bjarmason
@ 2019-01-12 18:51   ` Stefan Beller
  2019-01-15 19:13     ` Jeff King
  2019-01-15 19:12   ` Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: Stefan Beller @ 2019-01-12 18:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jeff King, git

> I wonder, and not as "you should do this" feedback on this series, just

There is a getenv_safe() in environment.c, but I guess a xgetenv() that
takes the same parameters as getenv() is better for ease of use.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] commit: copy saved getenv() result
  2019-01-12 10:26     ` Jeff King
@ 2019-01-15 14:05       ` Johannes Schindelin
  2019-01-15 19:17         ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Johannes Schindelin @ 2019-01-15 14:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

Hi Peff,

On Sat, 12 Jan 2019, Jeff King wrote:

> On Fri, Jan 11, 2019 at 07:07:15PM -0800, Junio C Hamano wrote:
> 
> > > diff --git a/builtin/commit.c b/builtin/commit.c
> > > index 004b816635..7d2e0b61e5 100644
> > > --- a/builtin/commit.c
> > > +++ b/builtin/commit.c
> > > @@ -351,7 +351,7 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
> > >  		if (write_locked_index(&the_index, &index_lock, 0))
> > >  			die(_("unable to create temporary index"));
> > >  
> > > -		old_index_env = getenv(INDEX_ENVIRONMENT);
> > > +		old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT));
> > >  		setenv(INDEX_ENVIRONMENT, get_lock_file_path(&index_lock), 1);
> > >  
> > >  		if (interactive_add(argc, argv, prefix, patch_interactive) != 0)
> > > @@ -361,6 +361,7 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
> > >  			setenv(INDEX_ENVIRONMENT, old_index_env, 1);
> > >  		else
> > >  			unsetenv(INDEX_ENVIRONMENT);
> > > +		FREE_AND_NULL(old_index_env);
> > >  
> > >  		discard_cache();
> > >  		read_cache_from(get_lock_file_path(&index_lock));
> > 
> > Even though it is not wrong per-se to assign a NULL to the
> > now-no-longer-referenced variable, I do not quite get why it is
> > free-and-null, not a straight free.  This may be a taste-thing,
> > though.
> > 
> > Even if a future update needs to make it possible to access
> > old_index_env somewhere in the block after discard_cache() gets
> > called, we would need to push down the free (or free-and-null) to
> > prolong its lifetime a bit anyway, so...
> 
> My thinking was that if we simply call free(), then the variable is left
> as a dangling pointer for the rest of the function, making it easy to
> accidentally use-after-free.

FWIW I thought that was your reasoning (and did not think of asking you
about it) and totally agree with it.

It is *too* easy not to realize that the `free()` call needs to be moved,
but a segmentation fault is a very strong indicator that it should be
moved.

> But certainly it would not be the first such instance in our code base.

Just because a lot of our code has grown historically does not mean that
we need to add code that shares the same shortcomings. FREE_AND_NULL() was
not available for a long time, after all, so it is understandable that we
did not use it back then. But it is available now, so we no longer have an
excuse to add less defensive code.

> In theory a static analyzer should easily be able to figure out such a
> problem, too, so maybe it is not worth being defensive about.

How often do you run a static analyzer?

My point being: if we can prevent future mistakes easily, and it does not
add too much code churn, why not just do it. No need to rely on fancy
stuff that might not even be available on your preferred platform.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] getenv() timing fixes
  2019-01-12 11:31 ` [PATCH 0/6] getenv() timing fixes Ævar Arnfjörð Bjarmason
  2019-01-12 18:51   ` Stefan Beller
@ 2019-01-15 19:12   ` Jeff King
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff King @ 2019-01-15 19:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Sat, Jan 12, 2019 at 12:31:21PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > So anyway. Here are a handful of what seem like pretty low-hanging
> > fruit. Beyond the first one, I'm not sure if they're triggerable, but
> > they're easy to fix. There are 100+ grep matches that I _didn't_ audit,
> > so this is by no means a complete fix. I was mostly trying to get a
> > sense of how painful these fixes would be.
> 
> I wonder, and not as "you should do this" feedback on this series, just
> on future development, whether we shouldn't just make our own getenv()
> wrapper for the majority of the GIT_* variables. The semantics would be
> fetch value X, and if it's ever requested again return the value we
> found the first time.

Yeah, that thought certainly crossed my mind while looking into this.
But as you noted below, you do sometimes have to worry about
invalidating that cache. The most general solution is that you'd hook
setenv(), too. At which point you've basically just constructed a shadow
environment that has less-crappy semantics than what POSIX guarantees. ;)

Another option is to just have a getenv_safe() that always returns an
allocated string. That's less efficient in some cases, but probably not
meaningfully so (you probably shouldn't be calling getenv() in a tight
loop anyway). It does mean dealing with memory ownership, though, which
is awkward in some cases (e.g., see git_editor).

Mostly I'm worried about making a system that's opaque or easy for
people to get wrong (e.g., if our getenv() wrapper quietly caches things
but setenv() does not invalidate that cache, that's a recipe for
confusion).

> Maybe such an API would just loop over "environ" on startup, looking for
> any GIT_* variables, i.e. called from the setup.c code.

I think whatever we do could just lazy-load. There's no particular
initialization we have to do at the beginning of the program.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] getenv() timing fixes
  2019-01-12 18:51   ` Stefan Beller
@ 2019-01-15 19:13     ` Jeff King
  2019-01-15 19:32       ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-15 19:13 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Ævar Arnfjörð Bjarmason, git

On Sat, Jan 12, 2019 at 10:51:42AM -0800, Stefan Beller wrote:

> > I wonder, and not as "you should do this" feedback on this series, just
> 
> There is a getenv_safe() in environment.c, but I guess a xgetenv() that
> takes the same parameters as getenv() is better for ease of use.

Yes, but it punts on the memory ownership by stuffing everything into an
argv_array. That saves a few lines if you're going to ask for five
variables, but for a single variable it's no better than:

  char *foo = getenv_safe("FOO");

  ...use foo...

  free(foo);

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] commit: copy saved getenv() result
  2019-01-15 14:05       ` Johannes Schindelin
@ 2019-01-15 19:17         ` Jeff King
  2019-01-15 19:25           ` Stefan Beller
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-15 19:17 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

On Tue, Jan 15, 2019 at 03:05:50PM +0100, Johannes Schindelin wrote:

> > But certainly it would not be the first such instance in our code base.
> 
> Just because a lot of our code has grown historically does not mean that
> we need to add code that shares the same shortcomings. FREE_AND_NULL() was
> not available for a long time, after all, so it is understandable that we
> did not use it back then. But it is available now, so we no longer have an
> excuse to add less defensive code.

Fair enough. I am happy to start using FREE_AND_NULL() consistently if
nobody things it's too opaque (or that it creates confusion that we
somehow expect to look at the variable again and need it to be NULL). I
think the compiler should generally be able to optimize out the NULL
assignment in most cases anyway.

> > In theory a static analyzer should easily be able to figure out such a
> > problem, too, so maybe it is not worth being defensive about.
> 
> How often do you run a static analyzer?

Stefan was routinely running coverity, though I haven't seen results in
a while. I think we should make sure that continues, as it did turn up
some useful results (and a lot of cruft, too, but on the whole I have
found it useful).

I also count gcc as a static analyzer, since it can and does point out
many simple problems. I don't know that it can point out an obvious
user-after-free like this, though.

That said....

> My point being: if we can prevent future mistakes easily, and it does not
> add too much code churn, why not just do it. No need to rely on fancy
> stuff that might not even be available on your preferred platform.

Yeah, I do agree (which is why I included it in the patch ;) ).

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] commit: copy saved getenv() result
  2019-01-15 19:17         ` Jeff King
@ 2019-01-15 19:25           ` Stefan Beller
  2019-01-15 19:32             ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Beller @ 2019-01-15 19:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Junio C Hamano, git

> Stefan was routinely running coverity, though I haven't seen results in
> a while. I think we should make sure that continues, as it did turn up
> some useful results (and a lot of cruft, too, but on the whole I have
> found it useful).

coverity had some outage (end of last year?-ish) and then changed the
way it dealt with automated uploads IIRC.
I have not looked into redoing the automation again since then.

Since 7th of Jan, they seem to have issues with hosting (for everyone
or just the open source projects?)
https://community.synopsys.com/s/article/Coverity-Scan-Update

For reference, the script that used to work is at
https://github.com/stefanbeller/git/commit/039be8078bb0379db271135e0c0d7315c34fe243
(which is on the `coverity` branch of that repo)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] commit: copy saved getenv() result
  2019-01-15 19:25           ` Stefan Beller
@ 2019-01-15 19:32             ` Jeff King
  2019-01-16 14:06               ` Johannes Schindelin
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-15 19:32 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Johannes Schindelin, Junio C Hamano, git

On Tue, Jan 15, 2019 at 11:25:45AM -0800, Stefan Beller wrote:

> > Stefan was routinely running coverity, though I haven't seen results in
> > a while. I think we should make sure that continues, as it did turn up
> > some useful results (and a lot of cruft, too, but on the whole I have
> > found it useful).
> 
> coverity had some outage (end of last year?-ish) and then changed the
> way it dealt with automated uploads IIRC.
> I have not looked into redoing the automation again since then.
> 
> Since 7th of Jan, they seem to have issues with hosting (for everyone
> or just the open source projects?)
> https://community.synopsys.com/s/article/Coverity-Scan-Update

Yuck. :(

> For reference, the script that used to work is at
> https://github.com/stefanbeller/git/commit/039be8078bb0379db271135e0c0d7315c34fe243
> (which is on the `coverity` branch of that repo)

Thanks for the update. I may take a look at trying to make this work
again at some point (but I won't be sad if somebody else gets to it
first).

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] getenv() timing fixes
  2019-01-15 19:13     ` Jeff King
@ 2019-01-15 19:32       ` Junio C Hamano
  2019-01-15 19:38         ` Stefan Beller
  2019-01-15 19:41         ` Jeff King
  0 siblings, 2 replies; 26+ messages in thread
From: Junio C Hamano @ 2019-01-15 19:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Ævar Arnfjörð Bjarmason, git

Jeff King <peff@peff.net> writes:

> On Sat, Jan 12, 2019 at 10:51:42AM -0800, Stefan Beller wrote:
>
>> > I wonder, and not as "you should do this" feedback on this series, just
>> 
>> There is a getenv_safe() in environment.c, but I guess a xgetenv() that
>> takes the same parameters as getenv() is better for ease of use.
>
> Yes, but it punts on the memory ownership by stuffing everything into an
> argv_array. That saves a few lines if you're going to ask for five
> variables, but for a single variable it's no better than:
>
>   char *foo = getenv_safe("FOO");

You meant xstrdup_or_null(getenv("FOO")) here?  And did Stefan mean

	#define xgetenv(e) xstrdup_or_null(getenv(e))

?

>   ...use foo...
>
>   free(foo);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] getenv() timing fixes
  2019-01-15 19:32       ` Junio C Hamano
@ 2019-01-15 19:38         ` Stefan Beller
  2019-01-15 19:41         ` Jeff King
  1 sibling, 0 replies; 26+ messages in thread
From: Stefan Beller @ 2019-01-15 19:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Ævar Arnfjörð Bjarmason, git

On Tue, Jan 15, 2019 at 11:32 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > On Sat, Jan 12, 2019 at 10:51:42AM -0800, Stefan Beller wrote:
> >
> >> > I wonder, and not as "you should do this" feedback on this series, just
> >>
> >> There is a getenv_safe() in environment.c, but I guess a xgetenv() that
> >> takes the same parameters as getenv() is better for ease of use.
> >
> > Yes, but it punts on the memory ownership by stuffing everything into an
> > argv_array. That saves a few lines if you're going to ask for five
> > variables, but for a single variable it's no better than:
> >
> >   char *foo = getenv_safe("FOO");
>
> You meant xstrdup_or_null(getenv("FOO")) here?  And did Stefan mean
>
>         #define xgetenv(e) xstrdup_or_null(getenv(e))
>
> ?

Assume I did. (I thought of it as a function effectively
adding the xstrdup_or_null)

If we go further into assuming the usage patterns of
these xgetenv calls, we might throw in an UNLEAK
as well, but that might be over board.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] getenv() timing fixes
  2019-01-15 19:32       ` Junio C Hamano
  2019-01-15 19:38         ` Stefan Beller
@ 2019-01-15 19:41         ` Jeff King
  2019-01-15 19:47           ` Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-15 19:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Ævar Arnfjörð Bjarmason, git

On Tue, Jan 15, 2019 at 11:32:56AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Sat, Jan 12, 2019 at 10:51:42AM -0800, Stefan Beller wrote:
> >
> >> > I wonder, and not as "you should do this" feedback on this series, just
> >> 
> >> There is a getenv_safe() in environment.c, but I guess a xgetenv() that
> >> takes the same parameters as getenv() is better for ease of use.
> >
> > Yes, but it punts on the memory ownership by stuffing everything into an
> > argv_array. That saves a few lines if you're going to ask for five
> > variables, but for a single variable it's no better than:
> >
> >   char *foo = getenv_safe("FOO");
> 
> You meant xstrdup_or_null(getenv("FOO")) here?  And did Stefan mean
> 
> 	#define xgetenv(e) xstrdup_or_null(getenv(e))
> 
> ?

Yes, I think that would be one possible implementation of a "safe"
getenv (and what I was thinking of specifically in that example).

The more involved one (that doesn't pass along memory ownership) is
something like:

  static struct hashmap env_cache;

  const char *getenv_safe(const char *name)
  {

	if (e = hashmap_get(&env_cache, name))
		return e->value;

        /* need some trickery to make sure xstrdup does not call getenv */
	e->value = xstrdup_or_null(getenv(name));
	e->name = xstrdup(name);
	hashmap_put(&env_cache, e);

	return e->value;
  }

with a matching setenv_safe() to drop the hashmap entry. Come to think
of it, this is really pretty equivalent to string-interning, which we
already have a hashmap for. I think one could argue that string
interning is basically just a controlled form of memory leaking, but
it's probably a reasonable compromise in this instance (i.e., we expect
to ask about a finite number of variables anyway; the important thing is
just that we don't leak memory for the same variable over and over).

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] getenv() timing fixes
  2019-01-15 19:41         ` Jeff King
@ 2019-01-15 19:47           ` Jeff King
  2019-01-15 20:49             ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2019-01-15 19:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Ævar Arnfjörð Bjarmason, git

On Tue, Jan 15, 2019 at 02:41:42PM -0500, Jeff King wrote:

> The more involved one (that doesn't pass along memory ownership) is
> something like:
> 
>   static struct hashmap env_cache;
> 
>   const char *getenv_safe(const char *name)
>   {
> 
> 	if (e = hashmap_get(&env_cache, name))
> 		return e->value;
> 
>         /* need some trickery to make sure xstrdup does not call getenv */
> 	e->value = xstrdup_or_null(getenv(name));
> 	e->name = xstrdup(name);
> 	hashmap_put(&env_cache, e);
> 
> 	return e->value;
>   }
> 
> with a matching setenv_safe() to drop the hashmap entry. Come to think
> of it, this is really pretty equivalent to string-interning, which we
> already have a hashmap for. I think one could argue that string
> interning is basically just a controlled form of memory leaking, but
> it's probably a reasonable compromise in this instance (i.e., we expect
> to ask about a finite number of variables anyway; the important thing is
> just that we don't leak memory for the same variable over and over).

So actually, that's pretty easy to do without writing much code at all.
Something like:

  #define xgetenv(name) strintern(getenv(name))

It means we're effectively storing the environment twice in the worst
case, but that's probably not a big deal. Unless we have a loop which
does repeated setenv()/getenv() calls, the strintern hashmap can't grow
without bound.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] getenv() timing fixes
  2019-01-15 19:47           ` Jeff King
@ 2019-01-15 20:49             ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2019-01-15 20:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Ævar Arnfjörð Bjarmason, git

Jeff King <peff@peff.net> writes:

> So actually, that's pretty easy to do without writing much code at all.
> Something like:
>
>   #define xgetenv(name) strintern(getenv(name))
>
> It means we're effectively storing the environment twice in the worst
> case, but that's probably not a big deal. Unless we have a loop which
> does repeated setenv()/getenv() calls, the strintern hashmap can't grow
> without bound.

Makes sense.



 hashmap.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hashmap.h b/hashmap.h
index d375d9cce7..cff77f9890 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -432,6 +432,8 @@ static inline void hashmap_enable_item_counting(struct hashmap *map)
 extern const void *memintern(const void *data, size_t len);
 static inline const char *strintern(const char *string)
 {
+	if (!string)
+		return string;
 	return memintern(string, strlen(string));
 }
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] commit: copy saved getenv() result
  2019-01-15 19:32             ` Jeff King
@ 2019-01-16 14:06               ` Johannes Schindelin
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Schindelin @ 2019-01-16 14:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Junio C Hamano, git

Hi Peff,

On Tue, 15 Jan 2019, Jeff King wrote:

> On Tue, Jan 15, 2019 at 11:25:45AM -0800, Stefan Beller wrote:
> 
> > > Stefan was routinely running coverity, though I haven't seen results in
> > > a while. I think we should make sure that continues, as it did turn up
> > > some useful results (and a lot of cruft, too, but on the whole I have
> > > found it useful).
> > 
> > coverity had some outage (end of last year?-ish) and then changed the
> > way it dealt with automated uploads IIRC.
> > I have not looked into redoing the automation again since then.
> > 
> > Since 7th of Jan, they seem to have issues with hosting (for everyone
> > or just the open source projects?)
> > https://community.synopsys.com/s/article/Coverity-Scan-Update
> 
> Yuck. :(

What a timely coincidence to corroborate my argument, eh?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-01-16 14:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
2019-01-11 22:15 ` [PATCH 1/6] get_super_prefix(): copy getenv() result Jeff King
2019-01-12  3:02   ` Junio C Hamano
2019-01-11 22:15 ` [PATCH 2/6] commit: copy saved " Jeff King
2019-01-12  3:07   ` Junio C Hamano
2019-01-12 10:26     ` Jeff King
2019-01-15 14:05       ` Johannes Schindelin
2019-01-15 19:17         ` Jeff King
2019-01-15 19:25           ` Stefan Beller
2019-01-15 19:32             ` Jeff King
2019-01-16 14:06               ` Johannes Schindelin
2019-01-11 22:15 ` [PATCH 3/6] config: make a copy of $GIT_CONFIG string Jeff King
2019-01-11 22:16 ` [PATCH 4/6] init: make a copy of $GIT_DIR string Jeff King
2019-01-12  3:08   ` Junio C Hamano
2019-01-11 22:16 ` [PATCH 5/6] merge-recursive: copy $GITHEAD strings Jeff King
2019-01-12  3:10   ` Junio C Hamano
2019-01-11 22:17 ` [PATCH 6/6] builtin_diff(): read $GIT_DIFF_OPTS closer to use Jeff King
2019-01-12 11:31 ` [PATCH 0/6] getenv() timing fixes Ævar Arnfjörð Bjarmason
2019-01-12 18:51   ` Stefan Beller
2019-01-15 19:13     ` Jeff King
2019-01-15 19:32       ` Junio C Hamano
2019-01-15 19:38         ` Stefan Beller
2019-01-15 19:41         ` Jeff King
2019-01-15 19:47           ` Jeff King
2019-01-15 20:49             ` Junio C Hamano
2019-01-15 19:12   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).