git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] revision: fix memory leaks with `struct cmdline_pathspec`
@ 2017-09-20 19:47 Martin Ågren
  2017-09-20 20:25 ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Ågren @ 2017-09-20 19:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

We don't free the array `prune_data.path` or the individual strings it
points to. Do so by introducing and using `free_cmdline_pathspec()`. To
be able to safely free the strings, always use `xstrdup()` when
assigning them. That does mean we allocate more memory than we used to,
but it also means it is clear who owns the strings and that we can stop
leaking those that we do allocate.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 revision.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/revision.c b/revision.c
index f9a90d7..dfb6a6c 100644
--- a/revision.c
+++ b/revision.c
@@ -1682,7 +1682,7 @@ static void append_prune_data(struct cmdline_pathspec *prune, const char **av)
 {
 	while (*av) {
 		ALLOC_GROW(prune->path, prune->nr + 1, prune->alloc);
-		prune->path[prune->nr++] = *(av++);
+		prune->path[prune->nr++] = xstrdup(*(av++));
 	}
 }
 
@@ -1695,6 +1695,18 @@ static void read_pathspec_from_stdin(struct rev_info *revs, struct strbuf *sb,
 	}
 }
 
+static void free_cmdline_pathspec(struct cmdline_pathspec *prune)
+{
+	int i;
+
+	for (i = 0; i < prune->nr; i++)
+		free((void *)prune->path[i]);
+
+	FREE_AND_NULL(prune->path);
+	prune->nr = 0;
+	prune->alloc = 0;
+}
+
 static void read_revisions_from_stdin(struct rev_info *revs,
 				      struct cmdline_pathspec *prune)
 {
@@ -2392,6 +2404,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 		prune_data.path[prune_data.nr++] = NULL;
 		parse_pathspec(&revs->prune_data, 0, 0,
 			       revs->prefix, prune_data.path);
+		free_cmdline_pathspec(&prune_data);
 	}
 
 	if (revs->def == NULL)
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: fix memory leaks with `struct cmdline_pathspec`
  2017-09-20 19:47 [PATCH] revision: fix memory leaks with `struct cmdline_pathspec` Martin Ågren
@ 2017-09-20 20:25 ` Jeff King
  2017-09-20 20:36   ` [PATCH] revision: replace "struct cmdline_pathspec" with argv_array Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2017-09-20 20:25 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Junio C Hamano

On Wed, Sep 20, 2017 at 09:47:26PM +0200, Martin Ågren wrote:

> We don't free the array `prune_data.path` or the individual strings it
> points to. Do so by introducing and using `free_cmdline_pathspec()`. To
> be able to safely free the strings, always use `xstrdup()` when
> assigning them. That does mean we allocate more memory than we used to,
> but it also means it is clear who owns the strings and that we can stop
> leaking those that we do allocate.

Hmm. From this description (and from looking at the patch), it seems
like we could just skip the allocation.

The missing piece of the puzzle is that sometimes we call
append_prune_data() to append from argv, and sometimes we use
read_pathspec_from_stdin().

So we may literally have a mix-and-match of allocated and unallocated
entries, and the only sane way to resolve that is by making them all
allocated.

So I think this solves the problem, but I couldn't help notice...

> @@ -1682,7 +1682,7 @@ static void append_prune_data(struct cmdline_pathspec *prune, const char **av)
>  {
>  	while (*av) {
>  		ALLOC_GROW(prune->path, prune->nr + 1, prune->alloc);
> -		prune->path[prune->nr++] = *(av++);
> +		prune->path[prune->nr++] = xstrdup(*(av++));
>  	}

Isn't this whole thing just an argv_array, and this is argv_array_pushv?
We even NULL-terminate it manually later on!

So rather than increasing the line count by adding
free_cmdline_pathspec, I think we could actually _reduce_ it by
converting to an argv array, as below. And then adding in your free
would be one extra line.

diff --git a/revision.c b/revision.c
index 94a5e98525..5c58b3fb2b 100644
--- a/revision.c
+++ b/revision.c
@@ -20,6 +20,7 @@
 #include "cache-tree.h"
 #include "bisect.h"
 #include "packfile.h"
+#include "argv-array.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -1612,31 +1613,15 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 	return 0;
 }
 
-struct cmdline_pathspec {
-	int alloc;
-	int nr;
-	const char **path;
-};
-
-static void append_prune_data(struct cmdline_pathspec *prune, const char **av)
-{
-	while (*av) {
-		ALLOC_GROW(prune->path, prune->nr + 1, prune->alloc);
-		prune->path[prune->nr++] = *(av++);
-	}
-}
-
 static void read_pathspec_from_stdin(struct rev_info *revs, struct strbuf *sb,
-				     struct cmdline_pathspec *prune)
+				     struct argv_array *prune)
 {
-	while (strbuf_getline(sb, stdin) != EOF) {
-		ALLOC_GROW(prune->path, prune->nr + 1, prune->alloc);
-		prune->path[prune->nr++] = xstrdup(sb->buf);
-	}
+	while (strbuf_getline(sb, stdin) != EOF)
+		argv_array_push(prune, sb->buf);
 }
 
 static void read_revisions_from_stdin(struct rev_info *revs,
-				      struct cmdline_pathspec *prune)
+				      struct argv_array *prune)
 {
 	struct strbuf sb;
 	int seen_dashdash = 0;
@@ -2201,10 +2186,9 @@ static void NORETURN diagnose_missing_default(const char *def)
 int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct setup_revision_opt *opt)
 {
 	int i, flags, left, seen_dashdash, read_from_stdin, got_rev_arg = 0, revarg_opt;
-	struct cmdline_pathspec prune_data;
+	struct argv_array prune_data = ARGV_ARRAY_INIT;
 	const char *submodule = NULL;
 
-	memset(&prune_data, 0, sizeof(prune_data));
 	if (opt)
 		submodule = opt->submodule;
 
@@ -2220,7 +2204,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			argv[i] = NULL;
 			argc = i;
 			if (argv[i + 1])
-				append_prune_data(&prune_data, argv + i + 1);
+				argv_array_pushv(&prune_data, argv + i + 1);
 			seen_dashdash = 1;
 			break;
 		}
@@ -2281,14 +2265,14 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			for (j = i; j < argc; j++)
 				verify_filename(revs->prefix, argv[j], j == i);
 
-			append_prune_data(&prune_data, argv + i);
+			argv_array_pushv(&prune_data, argv + i);
 			break;
 		}
 		else
 			got_rev_arg = 1;
 	}
 
-	if (prune_data.nr) {
+	if (prune_data.argc) {
 		/*
 		 * If we need to introduce the magic "a lone ':' means no
 		 * pathspec whatsoever", here is the place to do so.
@@ -2303,10 +2287,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 		 *	call init_pathspec() to set revs->prune_data here.
 		 * }
 		 */
-		ALLOC_GROW(prune_data.path, prune_data.nr + 1, prune_data.alloc);
-		prune_data.path[prune_data.nr++] = NULL;
 		parse_pathspec(&revs->prune_data, 0, 0,
-			       revs->prefix, prune_data.path);
+			       revs->prefix, prune_data.argv);
 	}
 
 	if (revs->def == NULL)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-20 20:25 ` Jeff King
@ 2017-09-20 20:36   ` Jeff King
  2017-09-20 22:48     ` Jonathan Nieder
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jeff King @ 2017-09-20 20:36 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Junio C Hamano

On Wed, Sep 20, 2017 at 04:25:52PM -0400, Jeff King wrote:

> Isn't this whole thing just an argv_array, and this is argv_array_pushv?
> We even NULL-terminate it manually later on!
> 
> So rather than increasing the line count by adding
> free_cmdline_pathspec, I think we could actually _reduce_ it by
> converting to an argv array, as below. And then adding in your free
> would be one extra line.

Here it is with a commit message, and that final free added.

Sorry for stealing your patch, but I didn't want to suggest "couldn't
you replace this with argv_array" without actually seeing if it was
possible. At which point the patch was pretty much done.

-- >8 --
Subject: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array

We assemble an array of strings in a custom struct,
NULL-terminate the result, and then pass it to
parse_pathspec().

But then we never free the array or the individual strings
(nor can we do the latter, as they are heap-allocated when
they come from stdin but not when they come from the
passed-in argv).

Let's swap this out for an argv_array. It does the same
thing with fewer lines of code, and it's safe to call
argv_array_clear() at the end to avoid a memory leak.

Reported-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c | 39 +++++++++++----------------------------
 1 file changed, 11 insertions(+), 28 deletions(-)

diff --git a/revision.c b/revision.c
index f9a90d71d2..1520f69d93 100644
--- a/revision.c
+++ b/revision.c
@@ -21,6 +21,7 @@
 #include "bisect.h"
 #include "packfile.h"
 #include "worktree.h"
+#include "argv-array.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -1672,31 +1673,15 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 	return 0;
 }
 
-struct cmdline_pathspec {
-	int alloc;
-	int nr;
-	const char **path;
-};
-
-static void append_prune_data(struct cmdline_pathspec *prune, const char **av)
-{
-	while (*av) {
-		ALLOC_GROW(prune->path, prune->nr + 1, prune->alloc);
-		prune->path[prune->nr++] = *(av++);
-	}
-}
-
 static void read_pathspec_from_stdin(struct rev_info *revs, struct strbuf *sb,
-				     struct cmdline_pathspec *prune)
+				     struct argv_array *prune)
 {
-	while (strbuf_getline(sb, stdin) != EOF) {
-		ALLOC_GROW(prune->path, prune->nr + 1, prune->alloc);
-		prune->path[prune->nr++] = xstrdup(sb->buf);
-	}
+	while (strbuf_getline(sb, stdin) != EOF)
+		argv_array_push(prune, sb->buf);
 }
 
 static void read_revisions_from_stdin(struct rev_info *revs,
-				      struct cmdline_pathspec *prune)
+				      struct argv_array *prune)
 {
 	struct strbuf sb;
 	int seen_dashdash = 0;
@@ -2286,10 +2271,9 @@ static void NORETURN diagnose_missing_default(const char *def)
 int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct setup_revision_opt *opt)
 {
 	int i, flags, left, seen_dashdash, read_from_stdin, got_rev_arg = 0, revarg_opt;
-	struct cmdline_pathspec prune_data;
+	struct argv_array prune_data = ARGV_ARRAY_INIT;
 	const char *submodule = NULL;
 
-	memset(&prune_data, 0, sizeof(prune_data));
 	if (opt)
 		submodule = opt->submodule;
 
@@ -2305,7 +2289,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			argv[i] = NULL;
 			argc = i;
 			if (argv[i + 1])
-				append_prune_data(&prune_data, argv + i + 1);
+				argv_array_pushv(&prune_data, argv + i + 1);
 			seen_dashdash = 1;
 			break;
 		}
@@ -2366,14 +2350,14 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			for (j = i; j < argc; j++)
 				verify_filename(revs->prefix, argv[j], j == i);
 
-			append_prune_data(&prune_data, argv + i);
+			argv_array_pushv(&prune_data, argv + i);
 			break;
 		}
 		else
 			got_rev_arg = 1;
 	}
 
-	if (prune_data.nr) {
+	if (prune_data.argc) {
 		/*
 		 * If we need to introduce the magic "a lone ':' means no
 		 * pathspec whatsoever", here is the place to do so.
@@ -2388,11 +2372,10 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 		 *	call init_pathspec() to set revs->prune_data here.
 		 * }
 		 */
-		ALLOC_GROW(prune_data.path, prune_data.nr + 1, prune_data.alloc);
-		prune_data.path[prune_data.nr++] = NULL;
 		parse_pathspec(&revs->prune_data, 0, 0,
-			       revs->prefix, prune_data.path);
+			       revs->prefix, prune_data.argv);
 	}
+	argv_array_clear(&prune_data);
 
 	if (revs->def == NULL)
 		revs->def = opt ? opt->def : NULL;
-- 
2.14.1.1040.gcaf8795f39


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-20 20:36   ` [PATCH] revision: replace "struct cmdline_pathspec" with argv_array Jeff King
@ 2017-09-20 22:48     ` Jonathan Nieder
  2017-09-21  3:04       ` Jeff King
  2017-09-21  3:57     ` Martin Ågren
  2017-09-21  4:11     ` Junio C Hamano
  2 siblings, 1 reply; 12+ messages in thread
From: Jonathan Nieder @ 2017-09-20 22:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Ågren, git, Junio C Hamano

Hi,

Jeff King wrote:

> Subject: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
>
> We assemble an array of strings in a custom struct,
> NULL-terminate the result, and then pass it to
> parse_pathspec().
>
> But then we never free the array or the individual strings
> (nor can we do the latter, as they are heap-allocated when
> they come from stdin but not when they come from the
> passed-in argv).

To be devil's advocate for a moment: why don't we use UNLEAK on the
paths passed in from stdin?

It's true that there can be an unbounded number of them, but they all
coexist in memory anyway.  They are not generated dynamically on the
fly.  Being able to free them doesn't have any obvious advantage over
using exit().

Except... is the idea that this allows the strings from stdin to be
freed sooner, as soon as they have been parsed into a "struct
pathspec"?

That sounds appealing.  The only risk would be if "struct pathspec"
holds on to a pointer to one of these strings.

Fortunately parse_pathspec() is careful to strdup any strings it
borrows (though it is not documented to do so).

In other words, proposed changes:

 1. Could the commit message describe what effect this would have on
    maximum heap usage, if any?  (In qualitative terms is fine, though
    actual numbers would be even better if it's easy to get them.)
    That would make it easier to justify not using UNLEAK.

 2. Can parse_pathspec get a comment in pathspec.h saying that it
    defensively copies anything it needs from args so the caller is
    free to modify or free it?  That way, it should be more obvious
    to people in the future modifying parse_pathspec() that callers
    may rely on that.  (The current API comment describes argv as
    "command line arguments", which I fear would send the opposite
    message to implementors.)

> Let's swap this out for an argv_array. It does the same
> thing with fewer lines of code, and it's safe to call
> argv_array_clear() at the end to avoid a memory leak.
>
> Reported-by: Martin Ågren <martin.agren@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  revision.c | 39 +++++++++++----------------------------
>  1 file changed, 11 insertions(+), 28 deletions(-)

The code looks good.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-20 22:48     ` Jonathan Nieder
@ 2017-09-21  3:04       ` Jeff King
  2017-09-21  3:49         ` Jonathan Nieder
  2017-09-21  4:41         ` Jonathan Nieder
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2017-09-21  3:04 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Martin Ågren, git, Junio C Hamano

On Wed, Sep 20, 2017 at 03:48:26PM -0700, Jonathan Nieder wrote:

> > Subject: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
> >
> > We assemble an array of strings in a custom struct,
> > NULL-terminate the result, and then pass it to
> > parse_pathspec().
> >
> > But then we never free the array or the individual strings
> > (nor can we do the latter, as they are heap-allocated when
> > they come from stdin but not when they come from the
> > passed-in argv).
> 
> To be devil's advocate for a moment: why don't we use UNLEAK on the
> paths passed in from stdin?
> 
> It's true that there can be an unbounded number of them, but they all
> coexist in memory anyway.  They are not generated dynamically on the
> fly.  Being able to free them doesn't have any obvious advantage over
> using exit().
>
> Except... is the idea that this allows the strings from stdin to be
> freed sooner, as soon as they have been parsed into a "struct
> pathspec"?

Well, no...the idea is that this is a function which leaks a bunch of
memory, and we shouldn't have to think hard about how often its leak can
be triggered or how severe it is. We should just fix it.

I agree with your analysis that we're likely only to leak one set of
--stdin input per program invocation (modulo some caller re-opening
stdin). And I also agree that it's a waste of memory to hold onto leaked
heap that cannot be used.

But mostly I am fundamentally against using UNLEAK() in a case like
this, because it does not match either of the properties which justified
adding UNLEAK() in the first place:

  1. We are about to exit the program, so the "leak" is only caused by
     the memory going out of scope at that exit.

     By contrast, the revision machinery may be called many times in the
     same program.

  2. The memory remains useful until around the time of program exit.

     This most certainly does not, as it would not even be reachable.

So while it may not do _too_ much harm, aside from increasing peak heap
unnecessarily, it's not a precedent I'd like to set. And certainly not
when we can fix the leak and reduce the code size at the same time.

> That sounds appealing.  The only risk would be if "struct pathspec"
> holds on to a pointer to one of these strings.
> 
> Fortunately parse_pathspec() is careful to strdup any strings it
> borrows (though it is not documented to do so).

A lot of the early interfaces in Git had really confusing
memory-ownership semantics. I think we've been slowly moving over the
years towards interfaces that spend a bit on extra copies to give simple
and consistent semantics (in fact, it was one of the reasons I added
argv_array in the first place).

So yes, it's good to double check that parse_pathspec() doesn't want to
hold on to the pointers. But I also think that should be the default
we're striving for in our APIs.

> In other words, proposed changes:
> 
>  1. Could the commit message describe what effect this would have on
>     maximum heap usage, if any?  (In qualitative terms is fine, though
>     actual numbers would be even better if it's easy to get them.)
>     That would make it easier to justify not using UNLEAK.

What wording are you looking for? It was a leak, and now it's gone.  The
size of the leak depends on how much you feed to --stdin. IMHO using
UNLEAK is totally inappropriate for this case, and doesn't even seem
like an alternative worth rejecting.

>  2. Can parse_pathspec get a comment in pathspec.h saying that it
>     defensively copies anything it needs from args so the caller is
>     free to modify or free it?  That way, it should be more obvious
>     to people in the future modifying parse_pathspec() that callers
>     may rely on that.  (The current API comment describes argv as
>     "command line arguments", which I fear would send the opposite
>     message to implementors.)

I certainly agree that the pathspec interface could use better
documentation. Patches welcome? :)

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-21  3:04       ` Jeff King
@ 2017-09-21  3:49         ` Jonathan Nieder
  2017-09-21  4:48           ` Jeff King
  2017-09-21  4:41         ` Jonathan Nieder
  1 sibling, 1 reply; 12+ messages in thread
From: Jonathan Nieder @ 2017-09-21  3:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Ågren, git, Junio C Hamano

Hi,

Jeff King wrote:

> But mostly I am fundamentally against using UNLEAK() in a case like
> this, because it does not match either of the properties which justified
> adding UNLEAK() in the first place:
>
>   1. We are about to exit the program, so the "leak" is only caused by
>      the memory going out of scope at that exit.
>
>      By contrast, the revision machinery may be called many times in the
>      same program.
>
>   2. The memory remains useful until around the time of program exit.
>
>      This most certainly does not, as it would not even be reachable.
[...]
> On Wed, Sep 20, 2017 at 03:48:26PM -0700, Jonathan Nieder wrote:

>> In other words, proposed changes:
>>
>>  1. Could the commit message describe what effect this would have on
>>     maximum heap usage, if any?  (In qualitative terms is fine, though
>>     actual numbers would be even better if it's easy to get them.)
>>     That would make it easier to justify not using UNLEAK.
>
> What wording are you looking for? It was a leak, and now it's gone.  The
> size of the leak depends on how much you feed to --stdin. IMHO using
> UNLEAK is totally inappropriate for this case, and doesn't even seem
> like an alternative worth rejecting.
>
>>  2. Can parse_pathspec get a comment in pathspec.h saying that it
>>     defensively copies anything it needs from args so the caller is
>>     free to modify or free it?  That way, it should be more obvious
>>     to people in the future modifying parse_pathspec() that callers
>>     may rely on that.  (The current API comment describes argv as
>>     "command line arguments", which I fear would send the opposite
>>     message to implementors.)
>
> I certainly agree that the pathspec interface could use better
> documentation. Patches welcome? :)

I think I failed at communicating here.  That is not what I meant at
all.

The context is that Git (especially older parts of it) suffers from a
pretty severe lack of API documentation.  For newcomers that is
especially obvious --- long-time git contributors, on the other hand,
may get used to patterns and common interfaces and may have trouble
seeing just how bad the lack of clearly communicated API contracts is.

There is a bit of a "broken window" problem, too: authors of one-off
patches may reasonably assume from existing code that this is just the
way it is and, lacking examples of how to document an API, add more
underdocumented API contracts.

The patch I am replying to tightens the contract for parse_pathspec().
I am not asking for comprehensive documentation of that function ---
that would be clearly off-topic for the patch.  Instead, I am saying
that we should document what we are newly requiring of the function in
the patch.  That way, the documented contract becomes clearer over
time as people document what they are relying on.  I think of that as
generally a good practice.

In other words, it was not at all obvious that "(2) The memory remains
useful until around the time of program exit" did not hold.  To a
casual reader it instead looks like (2) does hold, and there's no
documentation short of delving into the implementation of
parse_pathspec() to convince a reader otherwise.  The documentation
that is present leads to the opposite conclusion.

The assertion (1) that this allocation is going to happen multiple
times in a program isn't true either.  As you noted, we only read
stdin once.  But that doesn't matter as long as (2) doesn't hold.

TBH saying that I should write the one-sentence doc patch feels like
a cop-out.  Just like it is not sustainable for those reviewers that
are interested in good test coverage to be the only ones who write
tests, I think we cannot avoid treating documentation of API contracts
as a shared responsibility.

Thanks and hope that clarifies,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-20 20:36   ` [PATCH] revision: replace "struct cmdline_pathspec" with argv_array Jeff King
  2017-09-20 22:48     ` Jonathan Nieder
@ 2017-09-21  3:57     ` Martin Ågren
  2017-09-21  4:11     ` Junio C Hamano
  2 siblings, 0 replies; 12+ messages in thread
From: Martin Ågren @ 2017-09-21  3:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Junio C Hamano

On 20 September 2017 at 22:36, Jeff King <peff@peff.net> wrote:
> On Wed, Sep 20, 2017 at 04:25:52PM -0400, Jeff King wrote:
>
>> Isn't this whole thing just an argv_array, and this is argv_array_pushv?
>> We even NULL-terminate it manually later on!
>>
>> So rather than increasing the line count by adding
>> free_cmdline_pathspec, I think we could actually _reduce_ it by
>> converting to an argv array, as below. And then adding in your free
>> would be one extra line.
>
> Here it is with a commit message, and that final free added.
>
> Sorry for stealing your patch, but I didn't want to suggest "couldn't
> you replace this with argv_array" without actually seeing if it was
> possible. At which point the patch was pretty much done.

No need to be sorry. I wasn't aware of the argv_array thing, thanks for
seeing the bigger picture. I haven't looked into the details, or your
and Jonathan's discussion, but just from a cursory look this looks way
better. It reuses existing infrastructure, and then this:

>  revision.c | 39 +++++++++++----------------------------
>  1 file changed, 11 insertions(+), 28 deletions(-)

Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-20 20:36   ` [PATCH] revision: replace "struct cmdline_pathspec" with argv_array Jeff King
  2017-09-20 22:48     ` Jonathan Nieder
  2017-09-21  3:57     ` Martin Ågren
@ 2017-09-21  4:11     ` Junio C Hamano
  2 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2017-09-21  4:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Ågren, git

Jeff King <peff@peff.net> writes:

> Subject: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
>
> We assemble an array of strings in a custom struct,
> NULL-terminate the result, and then pass it to
> parse_pathspec().
>
> But then we never free the array or the individual strings
> (nor can we do the latter, as they are heap-allocated when
> they come from stdin but not when they come from the
> passed-in argv).
>
> Let's swap this out for an argv_array. It does the same
> thing with fewer lines of code, and it's safe to call
> argv_array_clear() at the end to avoid a memory leak.
>
> Reported-by: Martin Ågren <martin.agren@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  revision.c | 39 +++++++++++----------------------------
>  1 file changed, 11 insertions(+), 28 deletions(-)

Makes sense.  Thanks for cleaning up the mess I made.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-21  3:04       ` Jeff King
  2017-09-21  3:49         ` Jonathan Nieder
@ 2017-09-21  4:41         ` Jonathan Nieder
  2017-09-21  4:50           ` Jeff King
  2017-09-21  5:10           ` Junio C Hamano
  1 sibling, 2 replies; 12+ messages in thread
From: Jonathan Nieder @ 2017-09-21  4:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Ågren, git, Junio C Hamano

Jeff King wrote:
> On Wed, Sep 20, 2017 at 03:48:26PM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> Subject: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
>>>
>>> We assemble an array of strings in a custom struct,
>>> NULL-terminate the result, and then pass it to
>>> parse_pathspec().
>>>
>>> But then we never free the array or the individual strings
>>> (nor can we do the latter, as they are heap-allocated when
>>> they come from stdin but not when they come from the
>>> passed-in argv).
[...]
>> Except... is the idea that this allows the strings from stdin to be
>> freed sooner, as soon as they have been parsed into a "struct
>> pathspec"?
>
> Well, no...the idea is that this is a function which leaks a bunch of
> memory, and we shouldn't have to think hard about how often its leak can
> be triggered or how severe it is. We should just fix it.

I forgot to say: thanks for writing such a pleasant patch.

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

[...]
> I certainly agree that the pathspec interface could use better
> documentation. Patches welcome? :)

Here's such a patch.

-- 8< --
Subject: pathspec doc: parse_pathspec does not maintain references to args

The command line arguments passed to main() are valid for the life of
a program, but the same is not true for all other argv-style arrays
(e.g.  when a caller creates an argv_array).  Clarify that
parse_pathspec does not rely on the argv passed to it to remain valid.

This makes it easier to tell that callers like "git rev-list --stdin"
are safe and ensures that that is more likely to remain true as the
implementation of parse_pathspec evolves.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 pathspec.c | 4 ----
 pathspec.h | 7 +++++++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index e2a23ebc96..cdefdc7cc0 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -526,10 +526,6 @@ static void NORETURN unsupported_magic(const char *pattern,
 	    pattern, sb.buf);
 }
 
-/*
- * Given command line arguments and a prefix, convert the input to
- * pathspec. die() if any magic in magic_mask is used.
- */
 void parse_pathspec(struct pathspec *pathspec,
 		    unsigned magic_mask, unsigned flags,
 		    const char *prefix, const char **argv)
diff --git a/pathspec.h b/pathspec.h
index 60e6500401..6420d1080a 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -70,6 +70,13 @@ struct pathspec {
  */
 #define PATHSPEC_LITERAL_PATH (1<<6)
 
+/*
+ * Given command line arguments and a prefix, convert the input to
+ * pathspec. die() if any magic in magic_mask is used.
+ *
+ * Any arguments used are copied. It is safe for the caller to modify
+ * or free 'prefix' and 'args' after calling this function.
+ */
 extern void parse_pathspec(struct pathspec *pathspec,
 			   unsigned magic_mask,
 			   unsigned flags,
-- 
2.14.1.821.g8fa685d3b7


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-21  3:49         ` Jonathan Nieder
@ 2017-09-21  4:48           ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2017-09-21  4:48 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Martin Ågren, git, Junio C Hamano

On Wed, Sep 20, 2017 at 08:49:00PM -0700, Jonathan Nieder wrote:

> The patch I am replying to tightens the contract for parse_pathspec().

I disagree with this bit. In my view that was already the contract for
parse_pathspec(), and it is simply poorly documented. You can see it
being required elsewhere already (e.g., in line_log_init(), and quite
probably in other places that feed custom argv-arrays to
setup_revisions).

> In other words, it was not at all obvious that "(2) The memory remains
> useful until around the time of program exit" did not hold.  To a
> casual reader it instead looks like (2) does hold, and there's no
> documentation short of delving into the implementation of
> parse_pathspec() to convince a reader otherwise.  The documentation
> that is present leads to the opposite conclusion.

I guess the point I was trying to make is that I _didn't_ have that
opposite conclusion. A sane API does not create hidden memory ownership
dependencies (though also as I said, it does not hurt to double check if
you think it may be wrong, and certainly documenting also does not
hurt).

> The assertion (1) that this allocation is going to happen multiple
> times in a program isn't true either.  As you noted, we only read
> stdin once.  But that doesn't matter as long as (2) doesn't hold.

I don't think I asserted that this allocation happens multiple times. I
_did_ assert that the revision machinery may be called multiple times,
and I'd rather fix the leak than have people figure out the maximum
number of bytes we might leak in a given program. In other words, the
more important point of what I wrote is that we should not be reaching
for UNLEAK() at all in a function like this, even if you might be able
to justify it after analysis.

> TBH saying that I should write the one-sentence doc patch feels like
> a cop-out.  Just like it is not sustainable for those reviewers that
> are interested in good test coverage to be the only ones who write
> tests, I think we cannot avoid treating documentation of API contracts
> as a shared responsibility.

Hopefully the first paragraph I wrote above explains why I don't see
this as changing the contract at all (and therefore making the
documentation update orthogonal).

I agree that it's a good practice to leave the whole codebase a little
prettier than when you started, even if it's not strictly related to the
small change you want to make. And I hope my record of patches shows
that I follow through on that practice as a general rule.

But I also think it's easy for reviewers to ask for those orthogonal
changes, when it would be a similar amount of work for them to
communicate their suggestion in the form of a patch.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-21  4:41         ` Jonathan Nieder
@ 2017-09-21  4:50           ` Jeff King
  2017-09-21  5:10           ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff King @ 2017-09-21  4:50 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Martin Ågren, git, Junio C Hamano

On Wed, Sep 20, 2017 at 09:41:12PM -0700, Jonathan Nieder wrote:

> > Well, no...the idea is that this is a function which leaks a bunch of
> > memory, and we shouldn't have to think hard about how often its leak can
> > be triggered or how severe it is. We should just fix it.
> 
> I forgot to say: thanks for writing such a pleasant patch.
> 
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

You're welcome. :)

> Here's such a patch.
>
> -- 8< --
> Subject: pathspec doc: parse_pathspec does not maintain references to args
> [...]

Yay!

This looks great to me. I think the whole pathspec API could use better
documentation, but this is certainly a good start.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
  2017-09-21  4:41         ` Jonathan Nieder
  2017-09-21  4:50           ` Jeff King
@ 2017-09-21  5:10           ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2017-09-21  5:10 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jeff King, Martin Ågren, git

Jonathan Nieder <jrnieder@gmail.com> writes:

> diff --git a/pathspec.c b/pathspec.c
> index e2a23ebc96..cdefdc7cc0 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -526,10 +526,6 @@ static void NORETURN unsupported_magic(const char *pattern,
>  	    pattern, sb.buf);
>  }
>  
> -/*
> - * Given command line arguments and a prefix, convert the input to
> - * pathspec. die() if any magic in magic_mask is used.
> - */
>  void parse_pathspec(struct pathspec *pathspec,
>  		    unsigned magic_mask, unsigned flags,
>  		    const char *prefix, const char **argv)
> diff --git a/pathspec.h b/pathspec.h
> index 60e6500401..6420d1080a 100644
> --- a/pathspec.h
> +++ b/pathspec.h
> @@ -70,6 +70,13 @@ struct pathspec {
>   */
>  #define PATHSPEC_LITERAL_PATH (1<<6)
>  
> +/*
> + * Given command line arguments and a prefix, convert the input to
> + * pathspec. die() if any magic in magic_mask is used.
> + *
> + * Any arguments used are copied. It is safe for the caller to modify
> + * or free 'prefix' and 'args' after calling this function.
> + */
>  extern void parse_pathspec(struct pathspec *pathspec,
>  			   unsigned magic_mask,
>  			   unsigned flags,

Obviously the extra text is better than not having any, but I
somehow found "Any arguments used" a bit unsatisfactory.  magic_mask
and flags are probably also copied ;-) but I wonder if *pathspec is
also copied?  The second sentence that singles out 'prefix' and 'args'
helps to remove such a confusion from a clueless reader like me, and
makes me wonder if can take advantage of the existence of them in a
more direct way.

	It is safe for the caller to modify or free 'prefix' and
	'args' after calling this function, as copies of them are
	stored in the pathspec structure.

or something like that, perhaps.  Adding "(which is freed by calling
clear_pathspec())" at the end might even better, as that function
does not currently have any docstring.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-09-21  5:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20 19:47 [PATCH] revision: fix memory leaks with `struct cmdline_pathspec` Martin Ågren
2017-09-20 20:25 ` Jeff King
2017-09-20 20:36   ` [PATCH] revision: replace "struct cmdline_pathspec" with argv_array Jeff King
2017-09-20 22:48     ` Jonathan Nieder
2017-09-21  3:04       ` Jeff King
2017-09-21  3:49         ` Jonathan Nieder
2017-09-21  4:48           ` Jeff King
2017-09-21  4:41         ` Jonathan Nieder
2017-09-21  4:50           ` Jeff King
2017-09-21  5:10           ` Junio C Hamano
2017-09-21  3:57     ` Martin Ågren
2017-09-21  4:11     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).