git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] Safe-guard the Windows code a bit more against getenv() problems
@ 2019-02-15 15:17 Johannes Schindelin via GitGitGadget
  2019-02-15 15:17 ` [PATCH 1/1] mingw: safe-guard " Johannes Schindelin via GitGitGadget
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-02-15 15:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

We saw a couple of getenv() cleanups, where we now immediately duplicate the
return value of getenv() in order to avoid problems. However, I am really
uncomfortable with the current value (30) of GETENV_MAX_RETAIN in 
compat/mingw.c: it strikes me as low, given that the average number of 
getenv() call per Git process in Git's test suite is already 40.

I'd really like to increase it, even if it won't help problems where 
getenv() is called in a loop whose number of iteration depends on user input
(like the problem fixed in 6776a84dae (diff: ensure correct lifetime of
external_diff_cmd, 2019-01-11)). It will still fend off plenty of other
cases, I believe.

And yes, it's in my TODOs to look back over those getenv() issues after
v2.21.0 (see https://github.com/git-for-windows/git/pull/2019 for my
progress).

Johannes Schindelin (1):
  mingw: safe-guard a bit more against getenv() problems

 compat/mingw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 2d08f3deb9feb73dc8d21d75bfd367839fc1322c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-127%2Fdscho%2Fmingw-retain-more-getenv-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-127/dscho/mingw-retain-more-getenv-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/127
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] mingw: safe-guard a bit more against getenv() problems
  2019-02-15 15:17 [PATCH 0/1] Safe-guard the Windows code a bit more against getenv() problems Johannes Schindelin via GitGitGadget
@ 2019-02-15 15:17 ` Johannes Schindelin via GitGitGadget
  2019-02-15 16:14   ` Jeff Hostetler
  2019-02-21 13:52   ` Jeff King
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-02-15 15:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Running up to v2.21.0, we fixed two bugs that were made prominent by the
Windows-specific change to retain copies of only the 30 latest getenv()
calls' returned strings, invalidating any copies of previous getenv()
calls' return values.

While this really shines a light onto bugs of the form where we hold
onto getenv()'s return values without copying them, it is also a real
problem for users.

And even if Jeff King's patches merged via 773e408881 (Merge branch
'jk/save-getenv-result', 2019-01-29) provide further work on that front,
we are far from done. Just one example: on Windows, we unset environment
variables when spawning new processes, which potentially invalidates
strings that were previously obtained via getenv(), and therefore we
have to duplicate environment values that are somehow involved in
spawning new processes (e.g. GIT_MAN_VIEWER in show_man_page()).

We do not have a chance to investigate, let address, all of those issues
in time for v2.21.0, so let's at least help Windows users by increasing
the number of getenv() calls' return values that are kept valid. The
number 64 was determined by looking at the average number of getenv()
calls per process in the entire test suite run on Windows (which is
around 40) and then adding a bit for good measure. And it is a power of
two (which would have hit yesterday's theme perfectly).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 compat/mingw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 4276297595..8141f77189 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1632,7 +1632,7 @@ int mingw_kill(pid_t pid, int sig)
  */
 char *mingw_getenv(const char *name)
 {
-#define GETENV_MAX_RETAIN 30
+#define GETENV_MAX_RETAIN 64
 	static char *values[GETENV_MAX_RETAIN];
 	static int value_counter;
 	int len_key, len_value;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mingw: safe-guard a bit more against getenv() problems
  2019-02-15 15:17 ` [PATCH 1/1] mingw: safe-guard " Johannes Schindelin via GitGitGadget
@ 2019-02-15 16:14   ` Jeff Hostetler
  2019-02-15 18:25     ` Junio C Hamano
  2019-02-18 18:57     ` Johannes Schindelin
  2019-02-21 13:52   ` Jeff King
  1 sibling, 2 replies; 7+ messages in thread
From: Jeff Hostetler @ 2019-02-15 16:14 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Junio C Hamano, Johannes Schindelin



On 2/15/2019 10:17 AM, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> Running up to v2.21.0, we fixed two bugs that were made prominent by the
> Windows-specific change to retain copies of only the 30 latest getenv()
> calls' returned strings, invalidating any copies of previous getenv()
> calls' return values.
> 
> While this really shines a light onto bugs of the form where we hold
> onto getenv()'s return values without copying them, it is also a real
> problem for users.
> 
> And even if Jeff King's patches merged via 773e408881 (Merge branch
> 'jk/save-getenv-result', 2019-01-29) provide further work on that front,
> we are far from done. Just one example: on Windows, we unset environment
> variables when spawning new processes, which potentially invalidates
> strings that were previously obtained via getenv(), and therefore we
> have to duplicate environment values that are somehow involved in
> spawning new processes (e.g. GIT_MAN_VIEWER in show_man_page()).
> 
> We do not have a chance to investigate, let address, all of those issues
> in time for v2.21.0, so let's at least help Windows users by increasing
> the number of getenv() calls' return values that are kept valid. The
> number 64 was determined by looking at the average number of getenv()
> calls per process in the entire test suite run on Windows (which is
> around 40) and then adding a bit for good measure. And it is a power of
> two (which would have hit yesterday's theme perfectly).
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>   compat/mingw.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/compat/mingw.c b/compat/mingw.c
> index 4276297595..8141f77189 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -1632,7 +1632,7 @@ int mingw_kill(pid_t pid, int sig)
>    */
>   char *mingw_getenv(const char *name)
>   {
> -#define GETENV_MAX_RETAIN 30
> +#define GETENV_MAX_RETAIN 64
>   	static char *values[GETENV_MAX_RETAIN];
>   	static int value_counter;
>   	int len_key, len_value;
> 

Why not use a mem_pool for this?  We have that code isolated
and re-usable now.  Have mingw_getenv() copy the string into
the pool always return the pointer from within the pool.  The
pool automatically handles allocating new blocks as necessary.
And (if we care) we can bulk free the pool before existing.

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mingw: safe-guard a bit more against getenv() problems
  2019-02-15 16:14   ` Jeff Hostetler
@ 2019-02-15 18:25     ` Junio C Hamano
  2019-02-18 18:57     ` Johannes Schindelin
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2019-02-15 18:25 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Jeff Hostetler <git@jeffhostetler.com> writes:

>> diff --git a/compat/mingw.c b/compat/mingw.c
>> index 4276297595..8141f77189 100644
>> --- a/compat/mingw.c
>> +++ b/compat/mingw.c
>> @@ -1632,7 +1632,7 @@ int mingw_kill(pid_t pid, int sig)
>>    */
>>   char *mingw_getenv(const char *name)
>>   {
>> -#define GETENV_MAX_RETAIN 30
>> +#define GETENV_MAX_RETAIN 64
>>   	static char *values[GETENV_MAX_RETAIN];
>>   	static int value_counter;
>>   	int len_key, len_value;
>>
>
> Why not use a mem_pool for this?  We have that code isolated
> and re-usable now.  Have mingw_getenv() copy the string into
> the pool always return the pointer from within the pool.  The
> pool automatically handles allocating new blocks as necessary.
> And (if we care) we can bulk free the pool before existing.

Probably a good idea in the longer term.  The patch as posted would
do for the upcoming release, though.

Thanks, both.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mingw: safe-guard a bit more against getenv() problems
  2019-02-15 16:14   ` Jeff Hostetler
  2019-02-15 18:25     ` Junio C Hamano
@ 2019-02-18 18:57     ` Johannes Schindelin
  2019-02-21 13:58       ` Jeff King
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2019-02-18 18:57 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi Jeff,

On Fri, 15 Feb 2019, Jeff Hostetler wrote:

> On 2/15/2019 10:17 AM, Johannes Schindelin via GitGitGadget wrote:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > 
> > diff --git a/compat/mingw.c b/compat/mingw.c
> > index 4276297595..8141f77189 100644
> > --- a/compat/mingw.c
> > +++ b/compat/mingw.c
> > @@ -1632,7 +1632,7 @@ int mingw_kill(pid_t pid, int sig)
> >   */
> >   char *mingw_getenv(const char *name)
> >   {
> > -#define GETENV_MAX_RETAIN 30
> > +#define GETENV_MAX_RETAIN 64
> >    static char *values[GETENV_MAX_RETAIN];
> >    static int value_counter;
> >    int len_key, len_value;
> > 
> 
> Why not use a mem_pool for this?  We have that code isolated
> and re-usable now.  Have mingw_getenv() copy the string into
> the pool always return the pointer from within the pool.  The
> pool automatically handles allocating new blocks as necessary.
> And (if we care) we can bulk free the pool before existing.

The problem with the mem_pool would be that I do *not* want to accumulate
tons and tons of stale copies (remember that `difftool` bug? Apparently it
calls `getenv()` in a loop that scales with the number of modified files;
I would hate to have copies for all of those).

And all you could do to remedy this would be to blow away all of them it
the mem_pool grew too much, *including* the latest `getenv()` (which we
painfully keep valid, e.g. in the case of `git_committer_info()` which
calls `getenv()` *three* times *inside* the parameter list of the
`fmt_ident()` function call.

Of course, we could use a hashmap combined with a priority queue that
would cap at a given capacity, and once that is reached, culls the oldest
entries until the capacity is maintained again.

Taking a step back, thought, we are talking only about the
Windows-specific code, which is code we happen to have *some* control
over. On other platforms, we have a lot less control.

Meaning: the true fix would be some sort of static analysis that ensures
that getenv()'s returned values are duplicated unless it is super easy to
reason that the use is safe.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mingw: safe-guard a bit more against getenv() problems
  2019-02-15 15:17 ` [PATCH 1/1] mingw: safe-guard " Johannes Schindelin via GitGitGadget
  2019-02-15 16:14   ` Jeff Hostetler
@ 2019-02-21 13:52   ` Jeff King
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff King @ 2019-02-21 13:52 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Junio C Hamano, Johannes Schindelin

On Fri, Feb 15, 2019 at 07:17:45AM -0800, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> Running up to v2.21.0, we fixed two bugs that were made prominent by the
> Windows-specific change to retain copies of only the 30 latest getenv()
> calls' returned strings, invalidating any copies of previous getenv()
> calls' return values.
> 
> While this really shines a light onto bugs of the form where we hold
> onto getenv()'s return values without copying them, it is also a real
> problem for users.
> 
> And even if Jeff King's patches merged via 773e408881 (Merge branch
> 'jk/save-getenv-result', 2019-01-29) provide further work on that front,
> we are far from done. Just one example: on Windows, we unset environment
> variables when spawning new processes, which potentially invalidates
> strings that were previously obtained via getenv(), and therefore we
> have to duplicate environment values that are somehow involved in
> spawning new processes (e.g. GIT_MAN_VIEWER in show_man_page()).

Belated review, as this is already in master, but: yes, I absolutely
support this patch, even on top of my patches. Those were just the cases
I found by poking around for a few minutes, and I'm sure there are many
more.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mingw: safe-guard a bit more against getenv() problems
  2019-02-18 18:57     ` Johannes Schindelin
@ 2019-02-21 13:58       ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2019-02-21 13:58 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff Hostetler, Johannes Schindelin via GitGitGadget, git,
	Junio C Hamano

On Mon, Feb 18, 2019 at 07:57:14PM +0100, Johannes Schindelin wrote:

> > Why not use a mem_pool for this?  We have that code isolated
> > and re-usable now.  Have mingw_getenv() copy the string into
> > the pool always return the pointer from within the pool.  The
> > pool automatically handles allocating new blocks as necessary.
> > And (if we care) we can bulk free the pool before existing.
> 
> The problem with the mem_pool would be that I do *not* want to accumulate
> tons and tons of stale copies (remember that `difftool` bug? Apparently it
> calls `getenv()` in a loop that scales with the number of modified files;
> I would hate to have copies for all of those).
> 
> And all you could do to remedy this would be to blow away all of them it
> the mem_pool grew too much, *including* the latest `getenv()` (which we
> painfully keep valid, e.g. in the case of `git_committer_info()` which
> calls `getenv()` *three* times *inside* the parameter list of the
> `fmt_ident()` function call.

One suggestion I made elsewhere[1] is to use strintern() to de-dup the
responses, and then keep them around forever.  That bounds the memory
usage by the number of unique environment variables requested (which is
probably finite and limited), rather than the number of getenv() calls
(which as we have seen can be done in an arbitrary-sized loop).

I say "probably" because it's possible that there is code which mutates
the environment, reads it back, etc, in a loop. But that would be pretty
out-of-character for Git, I think.

-Peff

[1] https://public-inbox.org/git/20190115194726.GA5818@sigill.intra.peff.net/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-02-21 13:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-15 15:17 [PATCH 0/1] Safe-guard the Windows code a bit more against getenv() problems Johannes Schindelin via GitGitGadget
2019-02-15 15:17 ` [PATCH 1/1] mingw: safe-guard " Johannes Schindelin via GitGitGadget
2019-02-15 16:14   ` Jeff Hostetler
2019-02-15 18:25     ` Junio C Hamano
2019-02-18 18:57     ` Johannes Schindelin
2019-02-21 13:58       ` Jeff King
2019-02-21 13:52   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).