git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] setup: add `clear_repository_format()`
@ 2018-12-18  7:25 Martin Ågren
  2018-12-18  7:25 ` [PATCH 1/3] setup: drop return value from `read_repository_format()` Martin Ågren
                   ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Martin Ågren @ 2018-12-18  7:25 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

Patch 3 adds `clear_repository_format()` to plug a few memory leaks
related to `struct repository_format`. In preparation for that, patch 2
tightens up some code which uses a possibly-undefined value from the
struct. Patch 1 drops a return value that no-one has ever used.

Cc-ing Peff, who introduced `struct repository_format`, and brian, who
introduced its `hash_algo` field (see patch 2).

Martin Ågren (3):
  setup: drop return value from `read_repository_format()`
  setup: do not use invalid `repository_format`
  setup: add `clear_repository_format()`

 cache.h      | 13 +++++++++----
 repository.c |  1 +
 setup.c      | 19 ++++++++++++++++---
 3 files changed, 26 insertions(+), 7 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/3] setup: drop return value from `read_repository_format()`
  2018-12-18  7:25 [PATCH 0/3] setup: add `clear_repository_format()` Martin Ågren
@ 2018-12-18  7:25 ` Martin Ågren
  2018-12-19 15:27   ` Jeff King
  2018-12-18  7:25 ` [PATCH 2/3] setup: do not use invalid `repository_format` Martin Ågren
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 40+ messages in thread
From: Martin Ågren @ 2018-12-18  7:25 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

No-one looks at the return value, so we might as well drop it. It's
still available as `format->version`.

In v1 of what became commit 2cc7c2c737 ("setup: refactor repo format
reading and verification", 2016-03-11), this function actually had
return type "void", but that was changed in v2. Almost three years
later, no-one has used this return value.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 I only discovered the full history after writing the patch. Had I known
 it from the beginning, maybe I'd have just skipped this step, but I was
 sufficiently disturbed by the redundant and unused return value that I
 dropped it before working on the actual meat of this series.

 cache.h | 7 +++----
 setup.c | 3 +--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index ca36b44ee0..8b9e592c65 100644
--- a/cache.h
+++ b/cache.h
@@ -974,11 +974,10 @@ struct repository_format {
 
 /*
  * Read the repository format characteristics from the config file "path" into
- * "format" struct. Returns the numeric version. On error, -1 is returned,
- * format->version is set to -1, and all other fields in the struct are
- * undefined.
+ * "format" struct. On error, format->version is set to -1, and all other
+ * fields in the struct are undefined.
  */
-int read_repository_format(struct repository_format *format, const char *path);
+void read_repository_format(struct repository_format *format, const char *path);
 
 /*
  * Verify that the repository described by repository_format is something we
diff --git a/setup.c b/setup.c
index 1be5037f12..27747af7a3 100644
--- a/setup.c
+++ b/setup.c
@@ -509,7 +509,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	return 0;
 }
 
-int read_repository_format(struct repository_format *format, const char *path)
+void read_repository_format(struct repository_format *format, const char *path)
 {
 	memset(format, 0, sizeof(*format));
 	format->version = -1;
@@ -517,7 +517,6 @@ int read_repository_format(struct repository_format *format, const char *path)
 	format->hash_algo = GIT_HASH_SHA1;
 	string_list_init(&format->unknown_extensions, 1);
 	git_config_from_file(check_repo_format, path, format);
-	return format->version;
 }
 
 int verify_repository_format(const struct repository_format *format,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/3] setup: do not use invalid `repository_format`
  2018-12-18  7:25 [PATCH 0/3] setup: add `clear_repository_format()` Martin Ågren
  2018-12-18  7:25 ` [PATCH 1/3] setup: drop return value from `read_repository_format()` Martin Ågren
@ 2018-12-18  7:25 ` Martin Ågren
  2018-12-19  0:18   ` brian m. carlson
  2018-12-19 15:38   ` Jeff King
  2018-12-18  7:25 ` [PATCH 3/3] setup: add `clear_repository_format()` Martin Ågren
  2019-01-14 18:34 ` [PATCH v2 0/3] " Martin Ågren
  3 siblings, 2 replies; 40+ messages in thread
From: Martin Ågren @ 2018-12-18  7:25 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

If `read_repository_format()` encounters an error, `format->version`
will be -1 and all other fields of `format` will be undefined. However,
in `setup_git_directory_gently()`, we use `repo_fmt.hash_algo`
regardless of the value of `repo_fmt.version`.

This can be observed by adding this to the end of
`read_repository_format()`:

	if (format->version == -1)
		format->hash_algo = 0; /* no-one should peek at this! */

This causes, e.g., "git branch -m q q2 without config should succeed" in
t3200 to fail with "fatal: Failed to resolve HEAD as a valid ref."
because it has moved .git/config out of the way and is now trying to use
a bad hash algorithm.

Check that `version` is non-negative before using `hash_algo`.

This patch adds no tests, but do note that if we skip this patch, the
next patch would cause existing tests to fail as outlined above.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 I fully admit to not understanding all of this setup code, neither in
 its current incarnation, nor in terms of an ideal end game. This check
 seems like a good thing to do though.

 setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/setup.c b/setup.c
index 27747af7a3..52c3c9d31f 100644
--- a/setup.c
+++ b/setup.c
@@ -1138,7 +1138,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository && repo_fmt.version > -1)
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/3] setup: add `clear_repository_format()`
  2018-12-18  7:25 [PATCH 0/3] setup: add `clear_repository_format()` Martin Ågren
  2018-12-18  7:25 ` [PATCH 1/3] setup: drop return value from `read_repository_format()` Martin Ågren
  2018-12-18  7:25 ` [PATCH 2/3] setup: do not use invalid `repository_format` Martin Ågren
@ 2018-12-18  7:25 ` Martin Ågren
  2018-12-19 15:48   ` Jeff King
  2019-01-14 18:34 ` [PATCH v2 0/3] " Martin Ågren
  3 siblings, 1 reply; 40+ messages in thread
From: Martin Ågren @ 2018-12-18  7:25 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.

Introduce a function `clear_repository_format()` which frees the memory
the struct holds on to. Call it in the code paths where we currently
leak the memory. Also call it in the error path of
`read_repository_format()` to clean up any partial result.

For hygiene, we need to at least set the pointers that we free to NULL.
For future-proofing, let's zero the entire struct instead. It just means
that in the error path of `read_...()` we need to restore the error
sentinel in the `version` field.

We could take this opportunity to stop claiming that all fields except
`version` are undefined in case of an error. On the other hand, having
them defined as zero is not much better than having them undefined. We
could define them to some fallback configuration (`is_bare = -1` and
`hash_algo = GIT_HASH_SHA1`?), but "clear()" and/or "read()" seem like
the wrong places to enforce fallback configurations. Let's leave things
as "undefined" instead to encourage users to check `version`.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 The error state can always be defined later. Defining it now, then
 trying to backpedal, is probably not so fun. Filling the struct with
 non-zero values might help flush out bugs like the one fixed in the
 previous patch, but I'm wary of going that far in this patch.

 cache.h      |  6 ++++++
 repository.c |  1 +
 setup.c      | 14 ++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/cache.h b/cache.h
index 8b9e592c65..53ac01efa7 100644
--- a/cache.h
+++ b/cache.h
@@ -979,6 +979,12 @@ struct repository_format {
  */
 void read_repository_format(struct repository_format *format, const char *path);
 
+/*
+ * Free the memory held onto by `format`, but not the struct itself.
+ * (No need to use this after `read_repository_format()` fails.)
+ */
+void clear_repository_format(struct repository_format *format);
+
 /*
  * Verify that the repository described by repository_format is something we
  * can read. If it is, return 0. Otherwise, return -1, and "err" will describe
diff --git a/repository.c b/repository.c
index 5dd1486718..efa9d1d960 100644
--- a/repository.c
+++ b/repository.c
@@ -159,6 +159,7 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	clear_repository_format(&format);
 	return 0;
 
 error:
diff --git a/setup.c b/setup.c
index 52c3c9d31f..babe5ea156 100644
--- a/setup.c
+++ b/setup.c
@@ -517,6 +517,18 @@ void read_repository_format(struct repository_format *format, const char *path)
 	format->hash_algo = GIT_HASH_SHA1;
 	string_list_init(&format->unknown_extensions, 1);
 	git_config_from_file(check_repo_format, path, format);
+	if (format->version == -1) {
+		clear_repository_format(format);
+		format->version = -1;
+	}
+}
+
+void clear_repository_format(struct repository_format *format)
+{
+	string_list_clear(&format->unknown_extensions, 0);
+	free(format->work_tree);
+	free(format->partial_clone);
+	memset(format, 0, sizeof(*format));
 }
 
 int verify_repository_format(const struct repository_format *format,
@@ -1043,9 +1055,11 @@ int discover_git_directory(struct strbuf *commondir,
 		strbuf_release(&err);
 		strbuf_setlen(commondir, commondir_offset);
 		strbuf_setlen(gitdir, gitdir_offset);
+		clear_repository_format(&candidate);
 		return -1;
 	}
 
+	clear_repository_format(&candidate);
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/3] setup: do not use invalid `repository_format`
  2018-12-18  7:25 ` [PATCH 2/3] setup: do not use invalid `repository_format` Martin Ågren
@ 2018-12-19  0:18   ` brian m. carlson
  2018-12-19 21:43     ` Martin Ågren
  2018-12-19 15:38   ` Jeff King
  1 sibling, 1 reply; 40+ messages in thread
From: brian m. carlson @ 2018-12-19  0:18 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Jeff King

[-- Attachment #1: Type: text/plain, Size: 1085 bytes --]

On Tue, Dec 18, 2018 at 08:25:27AM +0100, Martin Ågren wrote:
>  I fully admit to not understanding all of this setup code, neither in
>  its current incarnation, nor in terms of an ideal end game. This check
>  seems like a good thing to do though.

It's definitely complex.

> diff --git a/setup.c b/setup.c
> index 27747af7a3..52c3c9d31f 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1138,7 +1138,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
>  			setup_git_env(gitdir);
>  		}
> -		if (startup_info->have_repository)
> +		if (startup_info->have_repository && repo_fmt.version > -1)
>  			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
>  	}

I think this change is fine, because we initialize the value in
the_repository elsewhere, and if there's no repository, this should
never have a value other than the default anyway.

I looked at the other patches in the series and thought they looked sane
as well.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3] setup: drop return value from `read_repository_format()`
  2018-12-18  7:25 ` [PATCH 1/3] setup: drop return value from `read_repository_format()` Martin Ågren
@ 2018-12-19 15:27   ` Jeff King
  2018-12-19 21:42     ` Martin Ågren
  2018-12-20  0:17     ` brian m. carlson
  0 siblings, 2 replies; 40+ messages in thread
From: Jeff King @ 2018-12-19 15:27 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, brian m . carlson

On Tue, Dec 18, 2018 at 08:25:26AM +0100, Martin Ågren wrote:

> No-one looks at the return value, so we might as well drop it. It's
> still available as `format->version`.
> 
> In v1 of what became commit 2cc7c2c737 ("setup: refactor repo format
> reading and verification", 2016-03-11), this function actually had
> return type "void", but that was changed in v2. Almost three years
> later, no-one has used this return value.

Hmm. If we have to pick one, I'd say that just returning a sane exit
value would be the more conventional thing to do. But looking at the
callers, many of them want to just pass the struct on to the verify
function.

That said, there is a long-standing curiosity here that we may want to
consider: read_repository_format() does not distinguish between these
three cases:

  1. the config file is missing

  2. the config file is present, but does not have a version field

  3. the config file is malformed, or we experience an I/O error
     (although I think there are still some cases that cause the config
     parser to die(), which may be sufficient)

The comment in check_repository_format_gently() implies that git-init
needs (1) to be a non-error for historical reasons. We could probably
tighten this up for other callers.

I think (2) probably should be an error, but note that it makes t1300
very unhappy, since it stomps all over .git/config. I'm not sure if any
real-world cases would be affected.

Case (3) I think we probably ought to do a better job of diagnosing. So
I wonder if the rule should be:

  - if we encounter a real error reading the config,
    read_repository_format() should return -1. Most callers should
    detect this and complain.

  - otherwise, a missing config returns 0 but puts "-1" into the version
    field

  - possibly verify_repository_format() should issue a warning when it
    sees "version == -1" and then rewrite the result into "0"

I dunno. This is one of those dark corners of the code where we appear
to do the wrong thing, but nobody seems to have noticed or cared much,
and changing it runs the risk of breaking some obscure cases. I'm not
sure if we should bite the bullet and try to address that, or just back
away slowly and pretend we never looked at it. ;)

> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  I only discovered the full history after writing the patch. Had I known
>  it from the beginning, maybe I'd have just skipped this step, but I was
>  sufficiently disturbed by the redundant and unused return value that I
>  dropped it before working on the actual meat of this series.
> 
>  cache.h | 7 +++----
>  setup.c | 3 +--
>  2 files changed, 4 insertions(+), 6 deletions(-)

FWIW, the patch itself seems fine, and obviously doesn't make anything
worse on its own. The question is just whether we want to do more
cleanup here.

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/3] setup: do not use invalid `repository_format`
  2018-12-18  7:25 ` [PATCH 2/3] setup: do not use invalid `repository_format` Martin Ågren
  2018-12-19  0:18   ` brian m. carlson
@ 2018-12-19 15:38   ` Jeff King
  2018-12-19 21:46     ` Martin Ågren
  2018-12-20  0:21     ` brian m. carlson
  1 sibling, 2 replies; 40+ messages in thread
From: Jeff King @ 2018-12-19 15:38 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, brian m . carlson

On Tue, Dec 18, 2018 at 08:25:27AM +0100, Martin Ågren wrote:

> If `read_repository_format()` encounters an error, `format->version`
> will be -1 and all other fields of `format` will be undefined. However,
> in `setup_git_directory_gently()`, we use `repo_fmt.hash_algo`
> regardless of the value of `repo_fmt.version`.
> 
> This can be observed by adding this to the end of
> `read_repository_format()`:
> 
> 	if (format->version == -1)
> 		format->hash_algo = 0; /* no-one should peek at this! */
> 
> This causes, e.g., "git branch -m q q2 without config should succeed" in
> t3200 to fail with "fatal: Failed to resolve HEAD as a valid ref."
> because it has moved .git/config out of the way and is now trying to use
> a bad hash algorithm.
> 
> Check that `version` is non-negative before using `hash_algo`.
> 
> This patch adds no tests, but do note that if we skip this patch, the
> next patch would cause existing tests to fail as outlined above.
> 
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>

Hmm. It looks like we never set repo_fmt.hash_algo to anything besides
GIT_HASH_SHA1 anyway. I guess the existing field is really just there in
preparation for us eventually respecting extensions.hashAlgorithm (or
whatever it's called).

Given what I said in my previous email about repos with a missing
"version" field, I wondered if this patch would be breaking config like:

  [core]
  # no repositoryformatversion!
  [extensions]
  hashAlgorithm = sha256

But I'd argue that:

  1. That's pretty dumb config that we shouldn't need to support. Even
     if we care about handling the missing version for historical repos,
     they wouldn't be talking sha256.

  2. Arguably we should not even look at extensions.* unless we see a
     version >= 1. But we do process them as we parse the config file.
     This is mostly an oversight, I think. We have to handle them as we
     see them, because they may come out of order with respect to the
     repositoryformatversion field. But we could put them into a
     string_list, and then only process them after we've decided which
     version we have.

So I think your patch is doing the right thing, and won't hurt any real
cases. But (of course) there are more opportunities to clean things up.

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/3] setup: add `clear_repository_format()`
  2018-12-18  7:25 ` [PATCH 3/3] setup: add `clear_repository_format()` Martin Ågren
@ 2018-12-19 15:48   ` Jeff King
  2018-12-19 21:49     ` Martin Ågren
  0 siblings, 1 reply; 40+ messages in thread
From: Jeff King @ 2018-12-19 15:48 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, brian m . carlson

On Tue, Dec 18, 2018 at 08:25:28AM +0100, Martin Ågren wrote:

> After we set up a `struct repository_format`, it owns various pieces of
> allocated memory. We then either use those members, because we decide we
> want to use the "candidate" repository format, or we discard the
> candidate / scratch space. In the first case, we transfer ownership of
> the memory to a few global variables. In the latter case, we just
> silently drop the struct and end up leaking memory.
> 
> Introduce a function `clear_repository_format()` which frees the memory
> the struct holds on to. Call it in the code paths where we currently
> leak the memory. Also call it in the error path of
> `read_repository_format()` to clean up any partial result.
> 
> For hygiene, we need to at least set the pointers that we free to NULL.
> For future-proofing, let's zero the entire struct instead. It just means
> that in the error path of `read_...()` we need to restore the error
> sentinel in the `version` field.

This seems reasonable, and I very much agree on the zero-ing (even
though it _shouldn't_ matter due to the "undefined" rule). That also
makes it safe to clear() multiple times, which is a nice property.

> +void clear_repository_format(struct repository_format *format)
> +{
> +	string_list_clear(&format->unknown_extensions, 0);
> +	free(format->work_tree);
> +	free(format->partial_clone);
> +	memset(format, 0, sizeof(*format));
>  }

For the callers that actually pick the values out, I think it might be a
little less error-prone if they actually copied the strings and then
called clear_repository_format(). That avoids leaks of values that they
didn't know or care about (and the cost of an extra strdup for
repository setup is not a big deal).

Something like this on top of your patch, I guess (with the idea being
that functions which return an error would clear the format, but a
"successful" one would get returned back up the stack to
setup_git_directory_gently(), which then clears it before returning.

-- >8 --
diff --git a/setup.c b/setup.c
index babe5ea156..a5699f9ee6 100644
--- a/setup.c
+++ b/setup.c
@@ -470,6 +470,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 			warning("%s", err.buf);
 			strbuf_release(&err);
 			*nongit_ok = -1;
+			clear_repository_format(candidate);
 			return -1;
 		}
 		die("%s", err.buf);
@@ -499,7 +500,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 		}
 		if (candidate->work_tree) {
 			free(git_work_tree_cfg);
-			git_work_tree_cfg = candidate->work_tree;
+			git_work_tree_cfg = xstrdup(candidate->work_tree);
 			inside_work_tree = -1;
 		}
 	} else {
@@ -1158,6 +1159,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 
 	strbuf_release(&dir);
 	strbuf_release(&gitdir);
+	clear_repository_format(&repo_fmt);
 
 	return prefix;
 }

-Peff

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3] setup: drop return value from `read_repository_format()`
  2018-12-19 15:27   ` Jeff King
@ 2018-12-19 21:42     ` Martin Ågren
  2018-12-20  0:17     ` brian m. carlson
  1 sibling, 0 replies; 40+ messages in thread
From: Martin Ågren @ 2018-12-19 21:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, brian m . carlson

On Wed, 19 Dec 2018 at 16:27, Jeff King <peff@peff.net> wrote:
>
> On Tue, Dec 18, 2018 at 08:25:26AM +0100, Martin Ågren wrote:
>
> > No-one looks at the return value, so we might as well drop it. It's
> > still available as `format->version`.
>
> Hmm. If we have to pick one, I'd say that just returning a sane exit
> value would be the more conventional thing to do. But looking at the
> callers, many of them want to just pass the struct on to the verify
> function.
>
> That said, there is a long-standing curiosity here that we may want to
> consider: read_repository_format() does not distinguish between these
> three cases:

[snip several valuable insights]

> I dunno. This is one of those dark corners of the code where we appear
> to do the wrong thing, but nobody seems to have noticed or cared much,
> and changing it runs the risk of breaking some obscure cases. I'm not
> sure if we should bite the bullet and try to address that, or just back
> away slowly and pretend we never looked at it. ;)

That was my reaction when I first dug into this. :-/ It's only now that
I've been brave enough to even try to dig through the first layer.

> FWIW, the patch itself seems fine, and obviously doesn't make anything
> worse on its own. The question is just whether we want to do more
> cleanup here.

Well, if we do want to make more cleanups around here, one of the more
obvious ideas is to actually use the return value. If such a cleanup is
going to start with a (moral) revert of this patch, then we're probably
better off just dropping this patch.

Thanks for your thoughtful response


Martin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/3] setup: do not use invalid `repository_format`
  2018-12-19  0:18   ` brian m. carlson
@ 2018-12-19 21:43     ` Martin Ågren
  0 siblings, 0 replies; 40+ messages in thread
From: Martin Ågren @ 2018-12-19 21:43 UTC (permalink / raw)
  To: brian m. carlson, Git Mailing List; +Cc: Jeff King

On Wed, 19 Dec 2018 at 01:18, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> I think this change is fine, because we initialize the value in
> the_repository elsewhere, and if there's no repository, this should
> never have a value other than the default anyway.

Thanks, it feels good that this patch matches how you think about the
`hash_algo` field.

> I looked at the other patches in the series and thought they looked sane
> as well.

Thanks for a review, I appreciate it.


Martin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/3] setup: do not use invalid `repository_format`
  2018-12-19 15:38   ` Jeff King
@ 2018-12-19 21:46     ` Martin Ågren
  2018-12-19 23:17       ` Jeff King
  2018-12-20  0:21     ` brian m. carlson
  1 sibling, 1 reply; 40+ messages in thread
From: Martin Ågren @ 2018-12-19 21:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, brian m . carlson

On Wed, 19 Dec 2018 at 16:38, Jeff King <peff@peff.net> wrote:
>
> On Tue, Dec 18, 2018 at 08:25:27AM +0100, Martin Ågren wrote:
>
> > Check that `version` is non-negative before using `hash_algo`.

> Hmm. It looks like we never set repo_fmt.hash_algo to anything besides
> GIT_HASH_SHA1 anyway. I guess the existing field is really just there in
> preparation for us eventually respecting extensions.hashAlgorithm (or
> whatever it's called).

That was my understanding as well. Maybe I should have spelled it out.

I think of the diff of this patch as "let's check `foo->valid` before we
`use(foo->bar)`", which should only be able to regress in case foo isn't
valid. And ...

> Given what I said in my previous email about repos with a missing
> "version" field, I wondered if this patch would be breaking config like:
>
>   [core]
>   # no repositoryformatversion!
>   [extensions]
>   hashAlgorithm = sha256
>
> But I'd argue that:
>
>   1. That's pretty dumb config that we shouldn't need to support. Even
>      if we care about handling the missing version for historical repos,
>      they wouldn't be talking sha256.

... this matches my thinking.

>   2. Arguably we should not even look at extensions.* unless we see a
>      version >= 1. But we do process them as we parse the config file.
>      This is mostly an oversight, I think. We have to handle them as we
>      see them, because they may come out of order with respect to the
>      repositoryformatversion field. But we could put them into a
>      string_list, and then only process them after we've decided which
>      version we have.

I hadn't thought too much about this. I guess that for some simpler
extensions--versions dependencies it would be feasible to first parse
everything, then, depending on the version we've identified, forget
about any "irrelevant" extensions. Again, nothing I've thought much
about, and seems to be safely out of scope for this patch.


> So I think your patch is doing the right thing, and won't hurt any real
> cases. But (of course) there are more opportunities to clean things up.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/3] setup: add `clear_repository_format()`
  2018-12-19 15:48   ` Jeff King
@ 2018-12-19 21:49     ` Martin Ågren
  0 siblings, 0 replies; 40+ messages in thread
From: Martin Ågren @ 2018-12-19 21:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, brian m . carlson

On Wed, 19 Dec 2018 at 16:48, Jeff King <peff@peff.net> wrote:
>
> On Tue, Dec 18, 2018 at 08:25:28AM +0100, Martin Ågren wrote:
>
> > +void clear_repository_format(struct repository_format *format)
> > +{
> > +     string_list_clear(&format->unknown_extensions, 0);
> > +     free(format->work_tree);
> > +     free(format->partial_clone);
> > +     memset(format, 0, sizeof(*format));
> >  }
>
> For the callers that actually pick the values out, I think it might be a
> little less error-prone if they actually copied the strings and then
> called clear_repository_format(). That avoids leaks of values that they
> didn't know or care about (and the cost of an extra strdup for
> repository setup is not a big deal).
>
> Something like this on top of your patch, I guess (with the idea being
> that functions which return an error would clear the format, but a
> "successful" one would get returned back up the stack to
> setup_git_directory_gently(), which then clears it before returning.

Thanks for the suggestion. I'll ponder 1) how to go about this
robustifying, 2) how to present the result as part of a v2 series.

To Junio on the sidelines in a cast (hope you're feeling better!):
you can expect a v2 of this series.


Martin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/3] setup: do not use invalid `repository_format`
  2018-12-19 21:46     ` Martin Ågren
@ 2018-12-19 23:17       ` Jeff King
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff King @ 2018-12-19 23:17 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, brian m . carlson

On Wed, Dec 19, 2018 at 10:46:52PM +0100, Martin Ågren wrote:

> >   2. Arguably we should not even look at extensions.* unless we see a
> >      version >= 1. But we do process them as we parse the config file.
> >      This is mostly an oversight, I think. We have to handle them as we
> >      see them, because they may come out of order with respect to the
> >      repositoryformatversion field. But we could put them into a
> >      string_list, and then only process them after we've decided which
> >      version we have.
> 
> I hadn't thought too much about this. I guess that for some simpler
> extensions--versions dependencies it would be feasible to first parse
> everything, then, depending on the version we've identified, forget
> about any "irrelevant" extensions. Again, nothing I've thought much
> about, and seems to be safely out of scope for this patch.

The decision is actually pretty straight-forward: if version < 1, ignore
extensions, otherwise respect them (and complain about any we don't know
about).

So I think we could just do in verify_repository_format() something
like:

  if (version < 1) {
    /* "undo" any extensions we might have parsed */
    data->precious_objects = 0;
    FREE_AND_NULL(data->partial_clone);
    data->worktree_config = 0;
    data->hash_algo = GIT_HASH_SHA1;
  } else {
    /* complain about unknown extension; we already do this! */
  }

It's a little ugly to have to know about all the extensions here, but we
already initialize them in read_repository_format(). We could probably
factor that out into a shared function.

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3] setup: drop return value from `read_repository_format()`
  2018-12-19 15:27   ` Jeff King
  2018-12-19 21:42     ` Martin Ågren
@ 2018-12-20  0:17     ` brian m. carlson
  2018-12-20  2:52       ` Jeff King
  1 sibling, 1 reply; 40+ messages in thread
From: brian m. carlson @ 2018-12-20  0:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Ågren, git

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

On Wed, Dec 19, 2018 at 10:27:35AM -0500, Jeff King wrote:
> I dunno. This is one of those dark corners of the code where we appear
> to do the wrong thing, but nobody seems to have noticed or cared much,
> and changing it runs the risk of breaking some obscure cases. I'm not
> sure if we should bite the bullet and try to address that, or just back
> away slowly and pretend we never looked at it. ;)

I will point out that with the SHA-256 work, reading the config file
becomes essential for SHA-256 repositories, because we need to know the
object format. Removing the config file leads to things blowing up in a
bad way (what specific bad way I don't remember).

That may influence the direction we want to take in this work, or not.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/3] setup: do not use invalid `repository_format`
  2018-12-19 15:38   ` Jeff King
  2018-12-19 21:46     ` Martin Ågren
@ 2018-12-20  0:21     ` brian m. carlson
  1 sibling, 0 replies; 40+ messages in thread
From: brian m. carlson @ 2018-12-20  0:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Ågren, git

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On Wed, Dec 19, 2018 at 10:38:41AM -0500, Jeff King wrote:
> Hmm. It looks like we never set repo_fmt.hash_algo to anything besides
> GIT_HASH_SHA1 anyway. I guess the existing field is really just there in
> preparation for us eventually respecting extensions.hashAlgorithm (or
> whatever it's called).

Yeah, it is.

I haven't tested, but since we just read the value of
extensions.objectFormat, this patch shouldn't have any effect on the
SHA-256 code. The default remains SHA-1 if a value isn't specified
somehow.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3] setup: drop return value from `read_repository_format()`
  2018-12-20  0:17     ` brian m. carlson
@ 2018-12-20  2:52       ` Jeff King
  2018-12-20  3:45         ` brian m. carlson
  0 siblings, 1 reply; 40+ messages in thread
From: Jeff King @ 2018-12-20  2:52 UTC (permalink / raw)
  To: brian m. carlson, Martin Ågren, git

On Thu, Dec 20, 2018 at 12:17:53AM +0000, brian m. carlson wrote:

> On Wed, Dec 19, 2018 at 10:27:35AM -0500, Jeff King wrote:
> > I dunno. This is one of those dark corners of the code where we appear
> > to do the wrong thing, but nobody seems to have noticed or cared much,
> > and changing it runs the risk of breaking some obscure cases. I'm not
> > sure if we should bite the bullet and try to address that, or just back
> > away slowly and pretend we never looked at it. ;)
> 
> I will point out that with the SHA-256 work, reading the config file
> becomes essential for SHA-256 repositories, because we need to know the
> object format. Removing the config file leads to things blowing up in a
> bad way (what specific bad way I don't remember).
> 
> That may influence the direction we want to take in this work, or not.

Wouldn't we just treat that the same way we do now? I.e., assume the
default of sha1, just like we assume repositoryformatversion==0?

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3] setup: drop return value from `read_repository_format()`
  2018-12-20  2:52       ` Jeff King
@ 2018-12-20  3:45         ` brian m. carlson
  2018-12-20 14:53           ` Jeff King
  0 siblings, 1 reply; 40+ messages in thread
From: brian m. carlson @ 2018-12-20  3:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Ågren, git

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

On Wed, Dec 19, 2018 at 09:52:12PM -0500, Jeff King wrote:
> On Thu, Dec 20, 2018 at 12:17:53AM +0000, brian m. carlson wrote:
> 
> > On Wed, Dec 19, 2018 at 10:27:35AM -0500, Jeff King wrote:
> > > I dunno. This is one of those dark corners of the code where we appear
> > > to do the wrong thing, but nobody seems to have noticed or cared much,
> > > and changing it runs the risk of breaking some obscure cases. I'm not
> > > sure if we should bite the bullet and try to address that, or just back
> > > away slowly and pretend we never looked at it. ;)
> > 
> > I will point out that with the SHA-256 work, reading the config file
> > becomes essential for SHA-256 repositories, because we need to know the
> > object format. Removing the config file leads to things blowing up in a
> > bad way (what specific bad way I don't remember).
> > 
> > That may influence the direction we want to take in this work, or not.
> 
> Wouldn't we just treat that the same way we do now? I.e., assume the
> default of sha1, just like we assume repositoryformatversion==0?

Yeah, we'll default to SHA-1, but the repository will be broken. HEAD
can't be read. Trying to run git status dies with "fatal: Unknown index
entry format". And so on. We've written data with 64-character object
IDs, which can't be read by Git in SHA-1 mode.

My point is essentially that in an SHA-256 repository, the config file
isn't optional anymore. We probably need to consider that and error out
in more situations (e.g. unreadable file or I/O error) instead of
silently falling back to the defaults, since failing loudly in a visible
way is better than having the user try to figure out why the index is
suddenly "corrupt".
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3] setup: drop return value from `read_repository_format()`
  2018-12-20  3:45         ` brian m. carlson
@ 2018-12-20 14:53           ` Jeff King
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff King @ 2018-12-20 14:53 UTC (permalink / raw)
  To: brian m. carlson, Martin Ågren, git

On Thu, Dec 20, 2018 at 03:45:55AM +0000, brian m. carlson wrote:

> > > I will point out that with the SHA-256 work, reading the config file
> > > becomes essential for SHA-256 repositories, because we need to know the
> > > object format. Removing the config file leads to things blowing up in a
> > > bad way (what specific bad way I don't remember).
> > > 
> > > That may influence the direction we want to take in this work, or not.
> > 
> > Wouldn't we just treat that the same way we do now? I.e., assume the
> > default of sha1, just like we assume repositoryformatversion==0?
> 
> Yeah, we'll default to SHA-1, but the repository will be broken. HEAD
> can't be read. Trying to run git status dies with "fatal: Unknown index
> entry format". And so on. We've written data with 64-character object
> IDs, which can't be read by Git in SHA-1 mode.

Oh, I see. Yes, if you have a SHA-256 repository and you don't tell
anybody (via a config entry), then everything will fail to work. That
seems like a perfectly reasonable outcome to me.

> My point is essentially that in an SHA-256 repository, the config file
> isn't optional anymore. We probably need to consider that and error out
> in more situations (e.g. unreadable file or I/O error) instead of
> silently falling back to the defaults, since failing loudly in a visible
> way is better than having the user try to figure out why the index is
> suddenly "corrupt".

Yes, I agree that ideally we'd produce a better error message. I'd just
be wary of breaking compatibility for the existing cases by making new
requirements when we don't yet suspect the repo is SHA-256.

When we see such a corruption, would it be possible to poke at the data
as if it were the old SHA-1 format, and if _that_ looks sane, suggest to
the user what the problem might be? That would help a number of cases
beyond this one (i.e., you're missing config, you have config but it has
the wrong repo format, you're missing the correct extensions field,
etc).

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/3] setup: add `clear_repository_format()`
  2018-12-18  7:25 [PATCH 0/3] setup: add `clear_repository_format()` Martin Ågren
                   ` (2 preceding siblings ...)
  2018-12-18  7:25 ` [PATCH 3/3] setup: add `clear_repository_format()` Martin Ågren
@ 2019-01-14 18:34 ` Martin Ågren
  2019-01-14 18:34   ` [PATCH v2 1/3] setup: free old value before setting `work_tree` Martin Ågren
                     ` (2 more replies)
  3 siblings, 3 replies; 40+ messages in thread
From: Martin Ågren @ 2019-01-14 18:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

This patch series addresses memory leaks related to `struct
repository_format`. Compared to v1 [1], this v2 has dropped the first
patch ("setup: drop return value from `read_repository_format()`") and
added another patch 1/3 (which solves a different problem).

Patch 2/3 is unchanged. Patch 3/3 now calls `clear_...()` not only in
the error paths, but in all paths, as suggested in the review of v1.

Thanks to Peff and brian for reviewing v1.

There's a minor textual conflict with ed/simplify-setup-git-dir. A merge
of these two topics still passes the test suite (and these leaks remain
plugged).

[1] https://public-inbox.org/git/20181218072528.3870492-1-martin.agren@gmail.com/#t

Martin Ågren (3):
  setup: free old value before setting `work_tree`
  setup: do not use invalid `repository_format`
  setup: add `clear_repository_format()`

 cache.h           | 12 ++++++++++++
 builtin/init-db.c |  3 ++-
 repository.c      |  3 ++-
 setup.c           | 33 ++++++++++++++++++++++++---------
 worktree.c        |  4 +++-
 5 files changed, 43 insertions(+), 12 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 1/3] setup: free old value before setting `work_tree`
  2019-01-14 18:34 ` [PATCH v2 0/3] " Martin Ågren
@ 2019-01-14 18:34   ` Martin Ågren
  2019-01-14 18:34   ` [PATCH v2 2/3] setup: do not use invalid `repository_format` Martin Ågren
  2019-01-14 18:34   ` [PATCH v2 3/3] setup: add `clear_repository_format()` Martin Ågren
  2 siblings, 0 replies; 40+ messages in thread
From: Martin Ågren @ 2019-01-14 18:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

Before assigning to `data->work_tree` in `read_worktree_config()`, free
any value we might already have picked up, so that we do not leak it.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/setup.c b/setup.c
index 1be5037f12..bb633942bb 100644
--- a/setup.c
+++ b/setup.c
@@ -411,6 +411,7 @@ static int read_worktree_config(const char *var, const char *value, void *vdata)
 	} else if (strcmp(var, "core.worktree") == 0) {
 		if (!value)
 			return config_error_nonbool(var);
+		free(data->work_tree);
 		data->work_tree = xstrdup(value);
 	}
 	return 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] setup: do not use invalid `repository_format`
  2019-01-14 18:34 ` [PATCH v2 0/3] " Martin Ågren
  2019-01-14 18:34   ` [PATCH v2 1/3] setup: free old value before setting `work_tree` Martin Ågren
@ 2019-01-14 18:34   ` Martin Ågren
  2019-01-15 19:31     ` Jeff King
  2019-01-14 18:34   ` [PATCH v2 3/3] setup: add `clear_repository_format()` Martin Ågren
  2 siblings, 1 reply; 40+ messages in thread
From: Martin Ågren @ 2019-01-14 18:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

If `read_repository_format()` encounters an error, `format->version`
will be -1 and all other fields of `format` will be undefined. However,
in `setup_git_directory_gently()`, we use `repo_fmt.hash_algo`
regardless of the value of `repo_fmt.version`.

This can be observed by adding this to the end of
`read_repository_format()`:

	if (format->version == -1)
		format->hash_algo = 0; /* no-one should peek at this! */

This causes, e.g., "git branch -m q q2 without config should succeed" in
t3200 to fail with "fatal: Failed to resolve HEAD as a valid ref."
because it has moved .git/config out of the way and is now trying to use
a bad hash algorithm.

Check that `version` is non-negative before using `hash_algo`.

This patch adds no tests, but do note that if we skip this patch, the
next patch would cause existing tests to fail as outlined above.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/setup.c b/setup.c
index bb633942bb..4d3d67c50b 100644
--- a/setup.c
+++ b/setup.c
@@ -1140,7 +1140,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository && repo_fmt.version > -1)
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 3/3] setup: add `clear_repository_format()`
  2019-01-14 18:34 ` [PATCH v2 0/3] " Martin Ågren
  2019-01-14 18:34   ` [PATCH v2 1/3] setup: free old value before setting `work_tree` Martin Ågren
  2019-01-14 18:34   ` [PATCH v2 2/3] setup: do not use invalid `repository_format` Martin Ågren
@ 2019-01-14 18:34   ` Martin Ågren
  2 siblings, 0 replies; 40+ messages in thread
From: Martin Ågren @ 2019-01-14 18:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.

Fixing these memory leaks is complicated by two issues:

  1) As described above, we "steal" the fields of the struct in case of
     success.

  2) We might end up calling `read_repository_format()` more than once
     -- as we enter it a second time, we lose track of our pointers and
     leak memory. As a further complication, we do not initialize (zero)
     the structs before calling `read_...()` so as we first enter it, we
     cannot blindly free the pointers in it.

To free this memory, in light of (1), we must either carefully cover all
error paths but no success paths; or we should stop grabbing the
pointers. To address 2), we must either zero the struct before calling
`read_repository_format()`, or try to keep track of when we should zero
it and when we should first free the memory.

Introduce `REPO_FORMAT_INIT` and `clear_repository_format()` to be used
on each side of `read_repository_format()`. Make the users duplicate the
memory from the structs, rather than copying the pointers.

Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct. In the error path of the same function, be sure to restore
the error sentinel after we clear it with the rest of the struct.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 I do wonder if introducing and depending on `REPOSITORY_FORMAT_INIT`
 like this is 100% sane. Out-of-tree users could be in for a nasty
 surprise. :-/

 cache.h           | 12 ++++++++++++
 builtin/init-db.c |  3 ++-
 repository.c      |  3 ++-
 setup.c           | 30 ++++++++++++++++++++++--------
 worktree.c        |  4 +++-
 5 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index ca36b44ee0..3ef63d27c4 100644
--- a/cache.h
+++ b/cache.h
@@ -972,6 +972,12 @@ struct repository_format {
 	struct string_list unknown_extensions;
 };
 
+/**
+ * Always use this to initialize a `struct repository_format`
+ * to a well-defined state before calling `read_repository()`.
+ */
+#define REPOSITORY_FORMAT_INIT { 0 }
+
 /*
  * Read the repository format characteristics from the config file "path" into
  * "format" struct. Returns the numeric version. On error, -1 is returned,
@@ -980,6 +986,12 @@ struct repository_format {
  */
 int read_repository_format(struct repository_format *format, const char *path);
 
+/*
+ * Free the memory held onto by `format`, but not the struct itself.
+ * (No need to use this after `read_repository_format()` fails.)
+ */
+void clear_repository_format(struct repository_format *format);
+
 /*
  * Verify that the repository described by repository_format is something we
  * can read. If it is, return 0. Otherwise, return -1, and "err" will describe
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 41faffd28d..04c60eaad5 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -96,7 +96,7 @@ static void copy_templates(const char *template_dir)
 	struct strbuf path = STRBUF_INIT;
 	struct strbuf template_path = STRBUF_INIT;
 	size_t template_len;
-	struct repository_format template_format;
+	struct repository_format template_format = REPOSITORY_FORMAT_INIT;
 	struct strbuf err = STRBUF_INIT;
 	DIR *dir;
 	char *to_free = NULL;
@@ -148,6 +148,7 @@ static void copy_templates(const char *template_dir)
 	free(to_free);
 	strbuf_release(&path);
 	strbuf_release(&template_path);
+	clear_repository_format(&template_format);
 }
 
 static int git_init_db_config(const char *k, const char *v, void *cb)
diff --git a/repository.c b/repository.c
index 7b02e1dffa..df88705574 100644
--- a/repository.c
+++ b/repository.c
@@ -148,7 +148,7 @@ int repo_init(struct repository *repo,
 	      const char *gitdir,
 	      const char *worktree)
 {
-	struct repository_format format;
+	struct repository_format format = REPOSITORY_FORMAT_INIT;
 	memset(repo, 0, sizeof(*repo));
 
 	repo->objects = raw_object_store_new();
@@ -165,6 +165,7 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	clear_repository_format(&format);
 	return 0;
 
 error:
diff --git a/setup.c b/setup.c
index 4d3d67c50b..70d9007ae5 100644
--- a/setup.c
+++ b/setup.c
@@ -477,7 +477,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	repository_format_partial_clone = candidate->partial_clone;
+	repository_format_partial_clone = xstrdup_or_null(candidate->partial_clone);
 	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 
@@ -500,11 +500,9 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 		}
 		if (candidate->work_tree) {
 			free(git_work_tree_cfg);
-			git_work_tree_cfg = candidate->work_tree;
+			git_work_tree_cfg = xstrdup(candidate->work_tree);
 			inside_work_tree = -1;
 		}
-	} else {
-		free(candidate->work_tree);
 	}
 
 	return 0;
@@ -512,15 +510,27 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 
 int read_repository_format(struct repository_format *format, const char *path)
 {
-	memset(format, 0, sizeof(*format));
+	clear_repository_format(format);
 	format->version = -1;
 	format->is_bare = -1;
 	format->hash_algo = GIT_HASH_SHA1;
 	string_list_init(&format->unknown_extensions, 1);
 	git_config_from_file(check_repo_format, path, format);
+	if (format->version == -1) {
+		clear_repository_format(format);
+		format->version = -1;
+	}
 	return format->version;
 }
 
+void clear_repository_format(struct repository_format *format)
+{
+	string_list_clear(&format->unknown_extensions, 0);
+	free(format->work_tree);
+	free(format->partial_clone);
+	memset(format, 0, sizeof(*format));
+}
+
 int verify_repository_format(const struct repository_format *format,
 			     struct strbuf *err)
 {
@@ -1008,7 +1018,7 @@ int discover_git_directory(struct strbuf *commondir,
 	struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT;
 	size_t gitdir_offset = gitdir->len, cwd_len;
 	size_t commondir_offset = commondir->len;
-	struct repository_format candidate;
+	struct repository_format candidate = REPOSITORY_FORMAT_INIT;
 
 	if (strbuf_getcwd(&dir))
 		return -1;
@@ -1045,9 +1055,11 @@ int discover_git_directory(struct strbuf *commondir,
 		strbuf_release(&err);
 		strbuf_setlen(commondir, commondir_offset);
 		strbuf_setlen(gitdir, gitdir_offset);
+		clear_repository_format(&candidate);
 		return -1;
 	}
 
+	clear_repository_format(&candidate);
 	return 0;
 }
 
@@ -1056,7 +1068,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 	static struct strbuf cwd = STRBUF_INIT;
 	struct strbuf dir = STRBUF_INIT, gitdir = STRBUF_INIT;
 	const char *prefix;
-	struct repository_format repo_fmt;
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 
 	/*
 	 * We may have read an incomplete configuration before
@@ -1146,6 +1158,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 
 	strbuf_release(&dir);
 	strbuf_release(&gitdir);
+	clear_repository_format(&repo_fmt);
 
 	return prefix;
 }
@@ -1203,9 +1216,10 @@ int git_config_perm(const char *var, const char *value)
 
 void check_repository_format(void)
 {
-	struct repository_format repo_fmt;
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 	check_repository_format_gently(get_git_dir(), &repo_fmt, NULL);
 	startup_info->have_repository = 1;
+	clear_repository_format(&repo_fmt);
 }
 
 /*
diff --git a/worktree.c b/worktree.c
index d6a0ee7f73..b45bfeb9d3 100644
--- a/worktree.c
+++ b/worktree.c
@@ -444,7 +444,7 @@ int submodule_uses_worktrees(const char *path)
 	DIR *dir;
 	struct dirent *d;
 	int ret = 0;
-	struct repository_format format;
+	struct repository_format format = REPOSITORY_FORMAT_INIT;
 
 	submodule_gitdir = git_pathdup_submodule(path, "%s", "");
 	if (!submodule_gitdir)
@@ -462,8 +462,10 @@ int submodule_uses_worktrees(const char *path)
 	read_repository_format(&format, sb.buf);
 	if (format.version != 0) {
 		strbuf_release(&sb);
+		clear_repository_format(&format);
 		return 1;
 	}
+	clear_repository_format(&format);
 
 	/* Replace config by worktrees. */
 	strbuf_setlen(&sb, sb.len - strlen("config"));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] setup: do not use invalid `repository_format`
  2019-01-14 18:34   ` [PATCH v2 2/3] setup: do not use invalid `repository_format` Martin Ågren
@ 2019-01-15 19:31     ` Jeff King
  2019-01-17  6:31       ` Martin Ågren
  0 siblings, 1 reply; 40+ messages in thread
From: Jeff King @ 2019-01-15 19:31 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, brian m . carlson

On Mon, Jan 14, 2019 at 07:34:56PM +0100, Martin Ågren wrote:

> If `read_repository_format()` encounters an error, `format->version`
> will be -1 and all other fields of `format` will be undefined. However,
> in `setup_git_directory_gently()`, we use `repo_fmt.hash_algo`
> regardless of the value of `repo_fmt.version`.
> 
> This can be observed by adding this to the end of
> `read_repository_format()`:
> 
> 	if (format->version == -1)
> 		format->hash_algo = 0; /* no-one should peek at this! */
> 
> This causes, e.g., "git branch -m q q2 without config should succeed" in
> t3200 to fail with "fatal: Failed to resolve HEAD as a valid ref."
> because it has moved .git/config out of the way and is now trying to use
> a bad hash algorithm.
> 
> Check that `version` is non-negative before using `hash_algo`.
> 
> This patch adds no tests, but do note that if we skip this patch, the
> next patch would cause existing tests to fail as outlined above.

I'm still somewhat confused about how this breaks. If we move
".git/config" out of the way, then we have no version indicator and
presumably we should guess GIT_HASH_SHA1. Which is what's happening if
we fail to call repo_set_hash_algo(), no?  In other words, wouldn't
repo_set_hash_algo() always be a noop in that case?

I get why adding the code snippet above would cause that assumption to
break, but I am just not sure why we would add that code snippet. ;)

I also get why read_repository_format() doing this in patch 3 would be a
problem:

  +       if (format->version == -1) {
  +               clear_repository_format(format);
  +               format->version = -1;
  +       }

but doesn't that point out that clear_repository_format() should be
setting hash_algo to GIT_HASH_SHA1 as the default (and likewise "bare =
-1", etc, that is done in that function)?

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] setup: do not use invalid `repository_format`
  2019-01-15 19:31     ` Jeff King
@ 2019-01-17  6:31       ` Martin Ågren
  2019-01-22  7:07         ` Jeff King
  0 siblings, 1 reply; 40+ messages in thread
From: Martin Ågren @ 2019-01-17  6:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git, brian m . carlson

On Tue, 15 Jan 2019 at 20:31, Jeff King <peff@peff.net> wrote:
>
> On Mon, Jan 14, 2019 at 07:34:56PM +0100, Martin Ågren wrote:
>
> > This can be observed by adding this to the end of
> > `read_repository_format()`:
> >
> >       if (format->version == -1)
> >               format->hash_algo = 0; /* no-one should peek at this! */
> >
> > Check that `version` is non-negative before using `hash_algo`.

> I'm still somewhat confused about how this breaks. If we move
> ".git/config" out of the way, then we have no version indicator and
> presumably we should guess GIT_HASH_SHA1. Which is what's happening if
> we fail to call repo_set_hash_algo(), no?  In other words, wouldn't
> repo_set_hash_algo() always be a noop in that case?
>
> I get why adding the code snippet above would cause that assumption to
> break, but I am just not sure why we would add that code snippet. ;)
>
> I also get why read_repository_format() doing this in patch 3 would be a
> problem:
>
>   +       if (format->version == -1) {
>   +               clear_repository_format(format);
>   +               format->version = -1;
>   +       }
>
> but doesn't that point out that clear_repository_format() should be
> setting hash_algo to GIT_HASH_SHA1 as the default (and likewise "bare =
> -1", etc, that is done in that function)?

Something like the below on top of this series (then rebased). (The last
hunk below is a revert of this patch.)

I'd like to think of the situation before this patch above as a
situation where the API promises something and the user uses the API
beyond that. The next patch in this series changes the internals of the
API in a way that is consistent with the promise made, but which ends up
affecting an over-eager user.

What this patch above does is to make the user do what the API promise
allows them to do, i.e., no more shortcuts. What you're saying is, why
isn't the promise stronger? So the user won't have to think as much?

So in particular, why doesn't `clear...()` and the error path in
`read_...()` impose sane, usable defaults? My first concern is that it
means we need to make a stronger promise, which might then be hard to
back away from, if we want to. Maybe we'll never want to...

My second concern is, what should we be falling back to, going forward?
At some point, the hash indicated by `REPOSITORY_FORMAT_INIT` will be
SHA-256. Before that, and as soon as we support both hashes, what if we
pick up SHA-256 before stumbling on some other piece of the config --
should we now reset the struct to indicate SHA-1, or rather keep the
SHA-256 value, which by itself is valid? (The same could be argued now,
for something other than hash functions, but the SHA-1/256 example might
be more obvious in the context of this patch.)

My third worry is that we should then equip `clear_...()` or at least
the error path of `read_...()` with some logic to keep "as much as
possible" of what we've picked up and reset the rest, all the while
making sure we don't end up with something self-contradicting or stupid.
After all, we'll have promised the users that they can ignore any errors
and just run ahead.

Maybe I'm worrying way too much, and I shouldn't be so afraid of making
a stronger promise here and now because of vague slippery-slope thinking.

Thanks for pushing back and forcing me to articulate my thinking.

Martin


diff --git a/cache.h b/cache.h
index 3ef63d27c4..acd86e9f9f 100644
--- a/cache.h
+++ b/cache.h
@@ -974,15 +974,21 @@ struct repository_format {
 
 /**
  * Always use this to initialize a `struct repository_format`
- * to a well-defined state before calling `read_repository()`.
+ * to a well-defined, default state before calling
+ * `read_repository()`.
  */
-#define REPOSITORY_FORMAT_INIT { 0 }
+#define REPOSITORY_FORMAT_INIT (struct repository_format){ \
+				 .version = -1, \
+				 .is_bare = -1, \
+				 .hash_algo = GIT_HASH_SHA1, \
+				 .unknown_extensions = STRING_LIST_INIT_DUP, \
+			       }
 
 /*
  * Read the repository format characteristics from the config file "path" into
  * "format" struct. Returns the numeric version. On error, -1 is returned,
  * format->version is set to -1, and all other fields in the struct are
- * undefined.
+ * set to the default configuration (REPOSITORY_FORMAT_INIT).
  */
 int read_repository_format(struct repository_format *format, const char *path);
 
diff --git a/setup.c b/setup.c
index 70d9007ae5..f3ea479ad9 100644
--- a/setup.c
+++ b/setup.c
@@ -511,15 +511,9 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 int read_repository_format(struct repository_format *format, const char *path)
 {
 	clear_repository_format(format);
-	format->version = -1;
-	format->is_bare = -1;
-	format->hash_algo = GIT_HASH_SHA1;
-	string_list_init(&format->unknown_extensions, 1);
 	git_config_from_file(check_repo_format, path, format);
-	if (format->version == -1) {
+	if (format->version == -1)
 		clear_repository_format(format);
-		format->version = -1;
-	}
 	return format->version;
 }
 
@@ -528,7 +522,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
-	memset(format, 0, sizeof(*format));
+	*format = REPOSITORY_FORMAT_INIT;
 }
 
 int verify_repository_format(const struct repository_format *format,
@@ -1152,7 +1146,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository && repo_fmt.version > -1)
+		if (startup_info->have_repository)
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 	}
 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] setup: do not use invalid `repository_format`
  2019-01-17  6:31       ` Martin Ågren
@ 2019-01-22  7:07         ` Jeff King
  2019-01-22 13:34           ` Martin Ågren
  0 siblings, 1 reply; 40+ messages in thread
From: Jeff King @ 2019-01-22  7:07 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, brian m . carlson

On Thu, Jan 17, 2019 at 07:31:14AM +0100, Martin Ågren wrote:

> > I also get why read_repository_format() doing this in patch 3 would be a
> > problem:
> >
> >   +       if (format->version == -1) {
> >   +               clear_repository_format(format);
> >   +               format->version = -1;
> >   +       }
> >
> > but doesn't that point out that clear_repository_format() should be
> > setting hash_algo to GIT_HASH_SHA1 as the default (and likewise "bare =
> > -1", etc, that is done in that function)?
> 
> Something like the below on top of this series (then rebased). (The last
> hunk below is a revert of this patch.)

Yes, that's exactly what I had in mind. Usually our clear() functions
put the struct back into some default state from which it can be used
gain. But the state after clear() here (without the patch below) is
something that nobody is ever expected to look at.

Granted, the only function which fills it in is read_...(), and it sets
those defaults itself. But it just seems to me if we're going to have to
put _something_ in the struct to initialize or clear it, it might as
well be those.

> I'd like to think of the situation before this patch above as a
> situation where the API promises something and the user uses the API
> beyond that. The next patch in this series changes the internals of the
> API in a way that is consistent with the promise made, but which ends up
> affecting an over-eager user.

As with many parts of Git, there really isn't a clear promise. :) I
don't think you're wrong at all about the current state of things. I'm
mostly basing my comments on "what would I _expect_ the promise to be
based on our general patterns". If that's far from what we promise now,
then it's a hassle to convert. But I think it's actually pretty close.

> What this patch above does is to make the user do what the API promise
> allows them to do, i.e., no more shortcuts. What you're saying is, why
> isn't the promise stronger? So the user won't have to think as much?
> 
> So in particular, why doesn't `clear...()` and the error path in
> `read_...()` impose sane, usable defaults? My first concern is that it
> means we need to make a stronger promise, which might then be hard to
> back away from, if we want to. Maybe we'll never want to...

I'm not too worried about that personally. I think the more likely
problem is that the API is misunderstood and misused. ;)

> My second concern is, what should we be falling back to, going forward?
> At some point, the hash indicated by `REPOSITORY_FORMAT_INIT` will be
> SHA-256. Before that, and as soon as we support both hashes, what if we
> pick up SHA-256 before stumbling on some other piece of the config --
> should we now reset the struct to indicate SHA-1, or rather keep the
> SHA-256 value, which by itself is valid? (The same could be argued now,
> for something other than hash functions, but the SHA-1/256 example might
> be more obvious in the context of this patch.)

I'd think this would _always_ be sha-1. Because it's not about "what's
the default for this program running". It's about "what have I read from
this on-disk repo config". And the rule there is "if they don't say
otherwise, it is sha1". That won't change even in a sha256 world,
because we'll maintain backwards-compatibility with legacy repositories
forever.

Now if your next question is: "does any caller misuse this as more than
looking at the repo format", I don't know the answer for sure. That
would be worth poking at (or perhaps having just poked yourself, you
might have an idea already).

> My third worry is that we should then equip `clear_...()` or at least
> the error path of `read_...()` with some logic to keep "as much as
> possible" of what we've picked up and reset the rest, all the while
> making sure we don't end up with something self-contradicting or stupid.
> After all, we'll have promised the users that they can ignore any errors
> and just run ahead.

I think clear() should always throw everything away. Saving partial bits
from the error path of read() is harder. My gut says "no", but I agree
that's a trickier question. I think the real-world thing here is: we're
reading repo config and see an extensions.* field that says "use
sha256". But then we encounter an error, or don't otherwise have a
version. What do we do?

If that's an undefined setup (and I think it is -- if you're using
extensions.* you're supposed to always set the version field), then I
don't know that it really matters that much. But throwing the whole
thing away (even if it means a buggy code path is more likely to use
sha1) seems OK to me.

> Maybe I'm worrying way too much, and I shouldn't be so afraid of making
> a stronger promise here and now because of vague slippery-slope thinking.
> 
> Thanks for pushing back and forcing me to articulate my thinking.

For the record, I can live with it either way. There are so many funky
little setup corner cases in the code already, and we don't even really
have a real-world case to dissect at this point. So the right thing may
also just be to finish this patch series as quickly as possible and move
on to something more useful. :)

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] setup: do not use invalid `repository_format`
  2019-01-22  7:07         ` Jeff King
@ 2019-01-22 13:34           ` Martin Ågren
  2019-01-22 21:45             ` [PATCH v3 0/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
  0 siblings, 1 reply; 40+ messages in thread
From: Martin Ågren @ 2019-01-22 13:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, brian m . carlson

On Tue, 22 Jan 2019 at 08:07, Jeff King <peff@peff.net> wrote:
>
> On Thu, Jan 17, 2019 at 07:31:14AM +0100, Martin Ågren wrote:
>
> > Something like the below on top of this series (then rebased). (The last
> > hunk below is a revert of this patch.)
>
> Yes, that's exactly what I had in mind. Usually our clear() functions
> put the struct back into some default state from which it can be used
> gain. But the state after clear() here (without the patch below) is
> something that nobody is ever expected to look at.

> > So in particular, why doesn't `clear...()` and the error path in
> > `read_...()` impose sane, usable defaults? My first concern is that it
> > means we need to make a stronger promise, which might then be hard to
> > back away from, if we want to. Maybe we'll never want to...
>
> I'm not too worried about that personally. I think the more likely
> problem is that the API is misunderstood and misused. ;)

Heh. Agreed. :-)

> Now if your next question is: "does any caller misuse this as more than
> looking at the repo format", I don't know the answer for sure. That
> would be worth poking at (or perhaps having just poked yourself, you
> might have an idea already).

Not really. I've stumbled around a little, but I'll need to do that some
more.

> For the record, I can live with it either way. There are so many funky
> little setup corner cases in the code already, and we don't even really
> have a real-world case to dissect at this point. So the right thing may
> also just be to finish this patch series as quickly as possible and move
> on to something more useful. :)

I rebased the "something like this?" into this series yesterday and I
think the end result is better, but also that the way there is clearer,
mostly because this patch is then gone. I wanted to double-check it
tonight and submit it. I'll do that tonight.

Thank you for your comments. They're really helpful.


Martin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 0/2] setup: fix memory leaks with `struct repository_format`
  2019-01-22 13:34           ` Martin Ågren
@ 2019-01-22 21:45             ` Martin Ågren
  2019-01-22 21:45               ` [PATCH v3 1/2] setup: free old value before setting `work_tree` Martin Ågren
  2019-01-22 21:45               ` [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
  0 siblings, 2 replies; 40+ messages in thread
From: Martin Ågren @ 2019-01-22 21:45 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

On Tue, 22 Jan 2019 at 14:34, Martin Ågren <martin.agren@gmail.com> wrote:
>
> On Tue, 22 Jan 2019 at 08:07, Jeff King <peff@peff.net> wrote:
>
> > For the record, I can live with it either way. There are so many funky
> > little setup corner cases in the code already, and we don't even really
> > have a real-world case to dissect at this point. So the right thing may
> > also just be to finish this patch series as quickly as possible and move
> > on to something more useful. :)
>
> I rebased the "something like this?" into this series yesterday and I
> think the end result is better, but also that the way there is clearer,
> mostly because this patch is then gone. I wanted to double-check it
> tonight and submit it. I'll do that tonight.

Here's that reroll. I now reset the entire struct in the error path of
`clear_...()`. Thus, the user that is reading `repo_fmt.hash_algo`
despite not being supposed to, can keep reading it, now knowing that the
value has a default value.

I also expanded on the documentation a little to point out that we'll
reset to the default struct state if we don't find any
"core.repositoryformatversion".

Martin

Martin Ågren (2):
  setup: free old value before setting `work_tree`
  setup: fix memory leaks with `struct repository_format`

 cache.h           | 27 ++++++++++++++++++++++++---
 builtin/init-db.c |  3 ++-
 repository.c      |  3 ++-
 setup.c           | 33 +++++++++++++++++++++------------
 worktree.c        |  4 +++-
 5 files changed, 52 insertions(+), 18 deletions(-)

-- 
2.20.1.98.gecbdaf0899


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 1/2] setup: free old value before setting `work_tree`
  2019-01-22 21:45             ` [PATCH v3 0/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
@ 2019-01-22 21:45               ` Martin Ågren
  2019-01-22 21:45               ` [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
  1 sibling, 0 replies; 40+ messages in thread
From: Martin Ågren @ 2019-01-22 21:45 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

Before assigning to `data->work_tree` in `read_worktree_config()`, free
any value we might already have picked up, so that we do not leak it.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/setup.c b/setup.c
index 1be5037f12..bb633942bb 100644
--- a/setup.c
+++ b/setup.c
@@ -411,6 +411,7 @@ static int read_worktree_config(const char *var, const char *value, void *vdata)
 	} else if (strcmp(var, "core.worktree") == 0) {
 		if (!value)
 			return config_error_nonbool(var);
+		free(data->work_tree);
 		data->work_tree = xstrdup(value);
 	}
 	return 0;
-- 
2.20.1.98.gecbdaf0899


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format`
  2019-01-22 21:45             ` [PATCH v3 0/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
  2019-01-22 21:45               ` [PATCH v3 1/2] setup: free old value before setting `work_tree` Martin Ågren
@ 2019-01-22 21:45               ` Martin Ågren
  2019-01-23  5:57                 ` Jeff King
  1 sibling, 1 reply; 40+ messages in thread
From: Martin Ågren @ 2019-01-22 21:45 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.

Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
function `clear_repository_format()`, to be used on each side of
`read_repository_format()`. To have a clear and simple memory ownership,
let all users of `struct repository_format` duplicate the strings that
they take from it, rather than stealing the pointers.

Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct, since we sometimes enter the function multiple times. This
means that it is important to initialize the struct before calling
`read_...()`, so document that. Teach `read_...()` to clear the struct
upon an error, so that it is reset to a safe state, and document this.

About that very last point: In `setup_git_directory_gently()`, we look
at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
weren't actually supposed to do per the API. After this commit, that's
ok.

We inherit the existing code's combining "error" and "no version found".
Both are signalled through `version == -1` and now both cause us to
clear any partial configuration we have picked up. For "extensions.*",
that's fine, since they require a positive version number. For
"core.bare" and "core.worktree", we're already verifying that we have a
non-negative version number before using them.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 cache.h           | 27 ++++++++++++++++++++++++---
 builtin/init-db.c |  3 ++-
 repository.c      |  3 ++-
 setup.c           | 32 ++++++++++++++++++++------------
 worktree.c        |  4 +++-
 5 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index ca36b44ee0..ec7c043412 100644
--- a/cache.h
+++ b/cache.h
@@ -972,14 +972,35 @@ struct repository_format {
 	struct string_list unknown_extensions;
 };
 
+/*
+ * Always use this to initialize a `struct repository_format`
+ * to a well-defined, default state before calling
+ * `read_repository()`.
+ */
+#define REPOSITORY_FORMAT_INIT \
+{ \
+	.version = -1, \
+	.is_bare = -1, \
+	.hash_algo = GIT_HASH_SHA1, \
+	.unknown_extensions = STRING_LIST_INIT_DUP, \
+}
+
 /*
  * Read the repository format characteristics from the config file "path" into
- * "format" struct. Returns the numeric version. On error, -1 is returned,
- * format->version is set to -1, and all other fields in the struct are
- * undefined.
+ * "format" struct. Returns the numeric version. On error, or if no version is
+ * found in the configuration, -1 is returned, format->version is set to -1,
+ * and all other fields in the struct are set to the default configuration
+ * (REPOSITORY_FORMAT_INIT). Always initialize the struct using
+ * REPOSITORY_FORMAT_INIT before calling this function.
  */
 int read_repository_format(struct repository_format *format, const char *path);
 
+/*
+ * Free the memory held onto by `format`, but not the struct itself.
+ * (No need to use this after `read_repository_format()` fails.)
+ */
+void clear_repository_format(struct repository_format *format);
+
 /*
  * Verify that the repository described by repository_format is something we
  * can read. If it is, return 0. Otherwise, return -1, and "err" will describe
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 41faffd28d..04c60eaad5 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -96,7 +96,7 @@ static void copy_templates(const char *template_dir)
 	struct strbuf path = STRBUF_INIT;
 	struct strbuf template_path = STRBUF_INIT;
 	size_t template_len;
-	struct repository_format template_format;
+	struct repository_format template_format = REPOSITORY_FORMAT_INIT;
 	struct strbuf err = STRBUF_INIT;
 	DIR *dir;
 	char *to_free = NULL;
@@ -148,6 +148,7 @@ static void copy_templates(const char *template_dir)
 	free(to_free);
 	strbuf_release(&path);
 	strbuf_release(&template_path);
+	clear_repository_format(&template_format);
 }
 
 static int git_init_db_config(const char *k, const char *v, void *cb)
diff --git a/repository.c b/repository.c
index 7b02e1dffa..df88705574 100644
--- a/repository.c
+++ b/repository.c
@@ -148,7 +148,7 @@ int repo_init(struct repository *repo,
 	      const char *gitdir,
 	      const char *worktree)
 {
-	struct repository_format format;
+	struct repository_format format = REPOSITORY_FORMAT_INIT;
 	memset(repo, 0, sizeof(*repo));
 
 	repo->objects = raw_object_store_new();
@@ -165,6 +165,7 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	clear_repository_format(&format);
 	return 0;
 
 error:
diff --git a/setup.c b/setup.c
index bb633942bb..14c2f8ce24 100644
--- a/setup.c
+++ b/setup.c
@@ -477,7 +477,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	repository_format_partial_clone = candidate->partial_clone;
+	repository_format_partial_clone = xstrdup_or_null(candidate->partial_clone);
 	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 
@@ -500,11 +500,9 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 		}
 		if (candidate->work_tree) {
 			free(git_work_tree_cfg);
-			git_work_tree_cfg = candidate->work_tree;
+			git_work_tree_cfg = xstrdup(candidate->work_tree);
 			inside_work_tree = -1;
 		}
-	} else {
-		free(candidate->work_tree);
 	}
 
 	return 0;
@@ -512,15 +510,21 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 
 int read_repository_format(struct repository_format *format, const char *path)
 {
-	memset(format, 0, sizeof(*format));
-	format->version = -1;
-	format->is_bare = -1;
-	format->hash_algo = GIT_HASH_SHA1;
-	string_list_init(&format->unknown_extensions, 1);
+	clear_repository_format(format);
 	git_config_from_file(check_repo_format, path, format);
+	if (format->version == -1)
+		clear_repository_format(format);
 	return format->version;
 }
 
+void clear_repository_format(struct repository_format *format)
+{
+	string_list_clear(&format->unknown_extensions, 0);
+	free(format->work_tree);
+	free(format->partial_clone);
+	*format = (struct repository_format)REPOSITORY_FORMAT_INIT;
+}
+
 int verify_repository_format(const struct repository_format *format,
 			     struct strbuf *err)
 {
@@ -1008,7 +1012,7 @@ int discover_git_directory(struct strbuf *commondir,
 	struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT;
 	size_t gitdir_offset = gitdir->len, cwd_len;
 	size_t commondir_offset = commondir->len;
-	struct repository_format candidate;
+	struct repository_format candidate = REPOSITORY_FORMAT_INIT;
 
 	if (strbuf_getcwd(&dir))
 		return -1;
@@ -1045,9 +1049,11 @@ int discover_git_directory(struct strbuf *commondir,
 		strbuf_release(&err);
 		strbuf_setlen(commondir, commondir_offset);
 		strbuf_setlen(gitdir, gitdir_offset);
+		clear_repository_format(&candidate);
 		return -1;
 	}
 
+	clear_repository_format(&candidate);
 	return 0;
 }
 
@@ -1056,7 +1062,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 	static struct strbuf cwd = STRBUF_INIT;
 	struct strbuf dir = STRBUF_INIT, gitdir = STRBUF_INIT;
 	const char *prefix;
-	struct repository_format repo_fmt;
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 
 	/*
 	 * We may have read an incomplete configuration before
@@ -1146,6 +1152,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 
 	strbuf_release(&dir);
 	strbuf_release(&gitdir);
+	clear_repository_format(&repo_fmt);
 
 	return prefix;
 }
@@ -1203,9 +1210,10 @@ int git_config_perm(const char *var, const char *value)
 
 void check_repository_format(void)
 {
-	struct repository_format repo_fmt;
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 	check_repository_format_gently(get_git_dir(), &repo_fmt, NULL);
 	startup_info->have_repository = 1;
+	clear_repository_format(&repo_fmt);
 }
 
 /*
diff --git a/worktree.c b/worktree.c
index d6a0ee7f73..b45bfeb9d3 100644
--- a/worktree.c
+++ b/worktree.c
@@ -444,7 +444,7 @@ int submodule_uses_worktrees(const char *path)
 	DIR *dir;
 	struct dirent *d;
 	int ret = 0;
-	struct repository_format format;
+	struct repository_format format = REPOSITORY_FORMAT_INIT;
 
 	submodule_gitdir = git_pathdup_submodule(path, "%s", "");
 	if (!submodule_gitdir)
@@ -462,8 +462,10 @@ int submodule_uses_worktrees(const char *path)
 	read_repository_format(&format, sb.buf);
 	if (format.version != 0) {
 		strbuf_release(&sb);
+		clear_repository_format(&format);
 		return 1;
 	}
+	clear_repository_format(&format);
 
 	/* Replace config by worktrees. */
 	strbuf_setlen(&sb, sb.len - strlen("config"));
-- 
2.20.1.98.gecbdaf0899


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format`
  2019-01-22 21:45               ` [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
@ 2019-01-23  5:57                 ` Jeff King
  2019-01-24  0:14                   ` brian m. carlson
  2019-01-25 19:24                   ` Martin Ågren
  0 siblings, 2 replies; 40+ messages in thread
From: Jeff King @ 2019-01-23  5:57 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, brian m . carlson

On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote:

> Call `clear_...()` at the start of `read_...()` instead of just zeroing
> the struct, since we sometimes enter the function multiple times. This
> means that it is important to initialize the struct before calling
> `read_...()`, so document that.

This part is a little counter-intuitive to me. Is anybody ever going to
pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT?

If so, might it be kinder for read_...() to not assume anything about
the incoming struct, and initialize it from scratch? I.e., not to use
clear() but just do the initialization step?

A caller which calls read_() multiple times would presumably have an
intervening clear (either their own, or the one done on an error return
from the read function).

Other than that minor nit, I like the overall shape of this.

One interesting tidbit:

> +/*
> + * Always use this to initialize a `struct repository_format`
> + * to a well-defined, default state before calling
> + * `read_repository()`.
> + */
> +#define REPOSITORY_FORMAT_INIT \
> +{ \
> +	.version = -1, \
> +	.is_bare = -1, \
> +	.hash_algo = GIT_HASH_SHA1, \
> +	.unknown_extensions = STRING_LIST_INIT_DUP, \
> +}
> [...]
> +	struct repository_format candidate = REPOSITORY_FORMAT_INIT;

This uses designated initializers, which is a C99-ism, but one we've
used previously and feel confident in. But...

> +void clear_repository_format(struct repository_format *format)
> +{
> +	string_list_clear(&format->unknown_extensions, 0);
> +	free(format->work_tree);
> +	free(format->partial_clone);
> +	*format = (struct repository_format)REPOSITORY_FORMAT_INIT;
> +}

...this uses that expression not as an initializer, but as a compound
literal. That's also C99, but AFAIK it's the first usage in our code
base. I don't know if it will cause problems or not.

The "old" way to do it is:

  struct repository_format foo = REPOSITORY_FORMAT_INIT;
  memcpy(format, &foo, sizeof(foo));

Given how simple it is to fix if it turns out to be a problem, I'm OK
including it as a weather balloon.

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format`
  2019-01-23  5:57                 ` Jeff King
@ 2019-01-24  0:14                   ` brian m. carlson
  2019-01-25 19:25                     ` Martin Ågren
  2019-01-25 19:24                   ` Martin Ågren
  1 sibling, 1 reply; 40+ messages in thread
From: brian m. carlson @ 2019-01-24  0:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Ågren, git

[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]

On Wed, Jan 23, 2019 at 12:57:05AM -0500, Jeff King wrote:
> This uses designated initializers, which is a C99-ism, but one we've
> used previously and feel confident in. But...
> 
> > +void clear_repository_format(struct repository_format *format)
> > +{
> > +	string_list_clear(&format->unknown_extensions, 0);
> > +	free(format->work_tree);
> > +	free(format->partial_clone);
> > +	*format = (struct repository_format)REPOSITORY_FORMAT_INIT;
> > +}
> 
> ...this uses that expression not as an initializer, but as a compound
> literal. That's also C99, but AFAIK it's the first usage in our code
> base. I don't know if it will cause problems or not.
> 
> The "old" way to do it is:
> 
>   struct repository_format foo = REPOSITORY_FORMAT_INIT;
>   memcpy(format, &foo, sizeof(foo));
> 
> Given how simple it is to fix if it turns out to be a problem, I'm OK
> including it as a weather balloon.

It's my understanding that MSVC doesn't support this construct. If we
care about supporting MSVC, then we need to write it without the
compound literal. MSVC doesn't support any C99 feature that is not also
in C++, unfortunately.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format`
  2019-01-23  5:57                 ` Jeff King
  2019-01-24  0:14                   ` brian m. carlson
@ 2019-01-25 19:24                   ` Martin Ågren
  2019-01-25 19:51                     ` Jeff King
  1 sibling, 1 reply; 40+ messages in thread
From: Martin Ågren @ 2019-01-25 19:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, brian m . carlson

On Wed, 23 Jan 2019 at 06:57, Jeff King <peff@peff.net> wrote:
>
> On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote:
>
> > Call `clear_...()` at the start of `read_...()` instead of just zeroing
> > the struct, since we sometimes enter the function multiple times. This
> > means that it is important to initialize the struct before calling
> > `read_...()`, so document that.
>
> This part is a little counter-intuitive to me. Is anybody ever going to
> pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT?

I do update all users in git.git, but yeah, out-of-tree users and
in-flight topics would segfault.

> If so, might it be kinder for read_...() to not assume anything about
> the incoming struct, and initialize it from scratch? I.e., not to use
> clear() but just do the initialization step?

I have some vague memory from going down that route and giving up. Now
that I'm looking at it again, I think we can at least try to do
something. We can make sure that "external" users that call into setup.c
are fine (they'll leak, but won't crash). Out-of-tree users inside
setup.c will still be able to trip on this. I don't have much spare time
over the next few days, but I'll get to this.

Or we could accept that we may leak when we end up calling `read()`
multiple times (I could catch all leaks now, but new ones might sneak in
after that) and come back to this after X months, when we can perhaps
afford to be a bit more aggressive.

I guess we could just rename the struct to have the compiler catch
out-of-tree users...

> A caller which calls read_() multiple times would presumably have an
> intervening clear (either their own, or the one done on an error return
> from the read function).
>
> Other than that minor nit, I like the overall shape of this.

Thank you.

Martin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format`
  2019-01-24  0:14                   ` brian m. carlson
@ 2019-01-25 19:25                     ` Martin Ågren
  0 siblings, 0 replies; 40+ messages in thread
From: Martin Ågren @ 2019-01-25 19:25 UTC (permalink / raw)
  To: brian m. carlson, Jeff King, Martin Ågren, Git Mailing List

On Thu, 24 Jan 2019 at 01:15, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On Wed, Jan 23, 2019 at 12:57:05AM -0500, Jeff King wrote:
> > > +void clear_repository_format(struct repository_format *format)
> > > +{
> > > +   string_list_clear(&format->unknown_extensions, 0);
> > > +   free(format->work_tree);
> > > +   free(format->partial_clone);
> > > +   *format = (struct repository_format)REPOSITORY_FORMAT_INIT;
> > > +}
> >
> > ...this uses that expression not as an initializer, but as a compound
> > literal. That's also C99, but AFAIK it's the first usage in our code
> > base. I don't know if it will cause problems or not.

> > Given how simple it is to fix if it turns out to be a problem, I'm OK
> > including it as a weather balloon.

Thanks for pointing out this potential problem.

> It's my understanding that MSVC doesn't support this construct. If we
> care about supporting MSVC, then we need to write it without the
> compound literal. MSVC doesn't support any C99 feature that is not also
> in C++, unfortunately.

Ok, better play it safe then. Thanks.

Martin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format`
  2019-01-25 19:24                   ` Martin Ågren
@ 2019-01-25 19:51                     ` Jeff King
  2019-02-25 19:21                       ` Martin Ågren
  0 siblings, 1 reply; 40+ messages in thread
From: Jeff King @ 2019-01-25 19:51 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, brian m . carlson

On Fri, Jan 25, 2019 at 08:24:35PM +0100, Martin Ågren wrote:

> On Wed, 23 Jan 2019 at 06:57, Jeff King <peff@peff.net> wrote:
> >
> > On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote:
> >
> > > Call `clear_...()` at the start of `read_...()` instead of just zeroing
> > > the struct, since we sometimes enter the function multiple times. This
> > > means that it is important to initialize the struct before calling
> > > `read_...()`, so document that.
> >
> > This part is a little counter-intuitive to me. Is anybody ever going to
> > pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT?
> 
> I do update all users in git.git, but yeah, out-of-tree users and
> in-flight topics would segfault.
> 
> > If so, might it be kinder for read_...() to not assume anything about
> > the incoming struct, and initialize it from scratch? I.e., not to use
> > clear() but just do the initialization step?
> 
> I have some vague memory from going down that route and giving up. Now
> that I'm looking at it again, I think we can at least try to do
> something. We can make sure that "external" users that call into setup.c
> are fine (they'll leak, but won't crash). Out-of-tree users inside
> setup.c will still be able to trip on this. I don't have much spare time
> over the next few days, but I'll get to this.
> 
> Or we could accept that we may leak when we end up calling `read()`
> multiple times (I could catch all leaks now, but new ones might sneak in
> after that) and come back to this after X months, when we can perhaps
> afford to be a bit more aggressive.
> 
> I guess we could just rename the struct to have the compiler catch
> out-of-tree users...

I'm less worried about out-of-tree users, and more concerned with just
having a calling convention that matches usual conventions (and is
harder to get wrong).

It's a pretty minor point, though, so I can live with it either way.

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format`
  2019-01-25 19:51                     ` Jeff King
@ 2019-02-25 19:21                       ` Martin Ågren
  2019-02-26 17:46                         ` Jeff King
  0 siblings, 1 reply; 40+ messages in thread
From: Martin Ågren @ 2019-02-25 19:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, brian m . carlson

On Fri, 25 Jan 2019 at 20:51, Jeff King <peff@peff.net> wrote:
>
> > On Wed, 23 Jan 2019 at 06:57, Jeff King <peff@peff.net> wrote:
> > >
> > > On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote:
> > >
> > > > Call `clear_...()` at the start of `read_...()` instead of just zeroing
> > > > the struct, since we sometimes enter the function multiple times. This
> > > > means that it is important to initialize the struct before calling
> > > > `read_...()`, so document that.
> > >
> > > This part is a little counter-intuitive to me. Is anybody ever going to
> > > pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT?

> > > If so, might it be kinder for read_...() to not assume anything about
> > > the incoming struct, and initialize it from scratch? I.e., not to use
> > > clear() but just do the initialization step?
>
> I'm less worried about out-of-tree users, and more concerned with just
> having a calling convention that matches usual conventions (and is
> harder to get wrong).
>
> It's a pretty minor point, though, so I can live with it either way.

It's time to resurrect this thread. I've reworked this patch to avoid
the compound literal when re-initing a struct, and I've been going back
and forth on this point about having to initialize to `..._INIT` or risk
crashing. And I keep coming back to thinking that it's not *that*
different from how `STRBUF_INIT` works.

There's the obvious difference that there aren't as many functions to
call, so there's certainly a difference in scale. And you'd think that
you'll always start with `read_...()`, but another potential first
function to call is `clear_...()` (see builtin/init-db.c), in which case
you better have used `..._INIT` first.

I'm tempted to address this point by documenting as good as I can in the
.h-file that one has to use this initializer macro. I'll obviously
convert all users, so copy-paste programming should work fine...

How does that sound to you?

Martin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format`
  2019-02-25 19:21                       ` Martin Ågren
@ 2019-02-26 17:46                         ` Jeff King
  2019-02-28 20:36                           ` [PATCH v4 0/2] " Martin Ågren
  0 siblings, 1 reply; 40+ messages in thread
From: Jeff King @ 2019-02-26 17:46 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, brian m . carlson

On Mon, Feb 25, 2019 at 08:21:07PM +0100, Martin Ågren wrote:

> It's time to resurrect this thread. I've reworked this patch to avoid
> the compound literal when re-initing a struct, and I've been going back
> and forth on this point about having to initialize to `..._INIT` or risk
> crashing. And I keep coming back to thinking that it's not *that*
> different from how `STRBUF_INIT` works.
> 
> There's the obvious difference that there aren't as many functions to
> call, so there's certainly a difference in scale. And you'd think that
> you'll always start with `read_...()`, but another potential first
> function to call is `clear_...()` (see builtin/init-db.c), in which case
> you better have used `..._INIT` first.
> 
> I'm tempted to address this point by documenting as good as I can in the
> .h-file that one has to use this initializer macro. I'll obviously
> convert all users, so copy-paste programming should work fine...
> 
> How does that sound to you?

It sounds pretty good.

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 0/2] setup: fix memory leaks with `struct repository_format`
  2019-02-26 17:46                         ` Jeff King
@ 2019-02-28 20:36                           ` Martin Ågren
  2019-02-28 20:36                             ` [PATCH v4 1/2] setup: free old value before setting `work_tree` Martin Ågren
                                               ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Martin Ågren @ 2019-02-28 20:36 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

This is a follow-up to v3 [1] from about a month ago. Patch 1 is
unchanged; patch 2 provides some additional documentation of the
initialization that is required, plus I've gotten rid of the compound
literal. Range-diff below.

Thanks Peff and brian for very helpful comments and discussion.

Martin
 
[1] https://public-inbox.org/git/cover.1548186510.git.martin.agren@gmail.com/

Martin Ågren (2):
  setup: free old value before setting `work_tree`
  setup: fix memory leaks with `struct repository_format`

 cache.h           | 31 ++++++++++++++++++++++++++++---
 builtin/init-db.c |  3 ++-
 repository.c      |  3 ++-
 setup.c           | 40 ++++++++++++++++++++++++++++------------
 worktree.c        |  4 +++-
 5 files changed, 63 insertions(+), 18 deletions(-)

Range-diff against v3:
1:  13019979b8 = 1:  13019979b8 setup: free old value before setting `work_tree`
2:  e0c4a73119 ! 2:  b21704c1e4 setup: fix memory leaks with `struct repository_format`
    @@ -16,15 +16,16 @@
         they take from it, rather than stealing the pointers.
     
         Call `clear_...()` at the start of `read_...()` instead of just zeroing
    -    the struct, since we sometimes enter the function multiple times. This
    -    means that it is important to initialize the struct before calling
    -    `read_...()`, so document that. Teach `read_...()` to clear the struct
    -    upon an error, so that it is reset to a safe state, and document this.
    +    the struct, since we sometimes enter the function multiple times. Thus,
    +    it is important to initialize the struct before calling `read_...()`, so
    +    document that. It's also important because we might not even call
    +    `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c.
     
    -    About that very last point: In `setup_git_directory_gently()`, we look
    -    at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
    +    Teach `read_...()` to clear the struct on error, so that it is reset to
    +    a safe state, and document this. (In `setup_git_directory_gently()`, we
    +    look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
         weren't actually supposed to do per the API. After this commit, that's
    -    ok.
    +    ok.)
     
         We inherit the existing code's combining "error" and "no version found".
         Both are signalled through `version == -1` and now both cause us to
    @@ -34,11 +35,21 @@
         non-negative version number before using them.
     
         Signed-off-by: Martin Ågren <martin.agren@gmail.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/cache.h b/cache.h
      --- a/cache.h
      +++ b/cache.h
    +@@
    + extern const char *core_partial_clone_filter_default;
    + extern int repository_format_worktree_config;
    + 
    ++/*
    ++ * You _have_ to initialize a `struct repository_format` using
    ++ * `= REPOSITORY_FORMAT_INIT` before calling `read_repository_format()`.
    ++ */
    + struct repository_format {
    + 	int version;
    + 	int precious_objects;
     @@
      	struct string_list unknown_extensions;
      };
    @@ -146,8 +157,15 @@
      	}
      
      	return 0;
    -@@
    + }
      
    ++static void init_repository_format(struct repository_format *format)
    ++{
    ++	const struct repository_format fresh = REPOSITORY_FORMAT_INIT;
    ++
    ++	memcpy(format, &fresh, sizeof(fresh));
    ++}
    ++
      int read_repository_format(struct repository_format *format, const char *path)
      {
     -	memset(format, 0, sizeof(*format));
    @@ -167,7 +185,7 @@
     +	string_list_clear(&format->unknown_extensions, 0);
     +	free(format->work_tree);
     +	free(format->partial_clone);
    -+	*format = (struct repository_format)REPOSITORY_FORMAT_INIT;
    ++	init_repository_format(format);
     +}
     +
      int verify_repository_format(const struct repository_format *format,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 1/2] setup: free old value before setting `work_tree`
  2019-02-28 20:36                           ` [PATCH v4 0/2] " Martin Ågren
@ 2019-02-28 20:36                             ` Martin Ågren
  2019-02-28 20:36                             ` [PATCH v4 2/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
  2019-03-06  4:56                             ` [PATCH v4 0/2] " Jeff King
  2 siblings, 0 replies; 40+ messages in thread
From: Martin Ågren @ 2019-02-28 20:36 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

Before assigning to `data->work_tree` in `read_worktree_config()`, free
any value we might already have picked up, so that we do not leak it.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/setup.c b/setup.c
index 1be5037f12..bb633942bb 100644
--- a/setup.c
+++ b/setup.c
@@ -411,6 +411,7 @@ static int read_worktree_config(const char *var, const char *value, void *vdata)
 	} else if (strcmp(var, "core.worktree") == 0) {
 		if (!value)
 			return config_error_nonbool(var);
+		free(data->work_tree);
 		data->work_tree = xstrdup(value);
 	}
 	return 0;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 2/2] setup: fix memory leaks with `struct repository_format`
  2019-02-28 20:36                           ` [PATCH v4 0/2] " Martin Ågren
  2019-02-28 20:36                             ` [PATCH v4 1/2] setup: free old value before setting `work_tree` Martin Ågren
@ 2019-02-28 20:36                             ` Martin Ågren
  2019-03-06  4:56                             ` [PATCH v4 0/2] " Jeff King
  2 siblings, 0 replies; 40+ messages in thread
From: Martin Ågren @ 2019-02-28 20:36 UTC (permalink / raw)
  To: git; +Cc: Jeff King, brian m . carlson

After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.

Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
function `clear_repository_format()`, to be used on each side of
`read_repository_format()`. To have a clear and simple memory ownership,
let all users of `struct repository_format` duplicate the strings that
they take from it, rather than stealing the pointers.

Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct, since we sometimes enter the function multiple times. Thus,
it is important to initialize the struct before calling `read_...()`, so
document that. It's also important because we might not even call
`read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c.

Teach `read_...()` to clear the struct on error, so that it is reset to
a safe state, and document this. (In `setup_git_directory_gently()`, we
look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
weren't actually supposed to do per the API. After this commit, that's
ok.)

We inherit the existing code's combining "error" and "no version found".
Both are signalled through `version == -1` and now both cause us to
clear any partial configuration we have picked up. For "extensions.*",
that's fine, since they require a positive version number. For
"core.bare" and "core.worktree", we're already verifying that we have a
non-negative version number before using them.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 cache.h           | 31 ++++++++++++++++++++++++++++---
 builtin/init-db.c |  3 ++-
 repository.c      |  3 ++-
 setup.c           | 39 +++++++++++++++++++++++++++------------
 worktree.c        |  4 +++-
 5 files changed, 62 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index ca36b44ee0..8c32c904c3 100644
--- a/cache.h
+++ b/cache.h
@@ -961,6 +961,10 @@ extern char *repository_format_partial_clone;
 extern const char *core_partial_clone_filter_default;
 extern int repository_format_worktree_config;
 
+/*
+ * You _have_ to initialize a `struct repository_format` using
+ * `= REPOSITORY_FORMAT_INIT` before calling `read_repository_format()`.
+ */
 struct repository_format {
 	int version;
 	int precious_objects;
@@ -972,14 +976,35 @@ struct repository_format {
 	struct string_list unknown_extensions;
 };
 
+/*
+ * Always use this to initialize a `struct repository_format`
+ * to a well-defined, default state before calling
+ * `read_repository()`.
+ */
+#define REPOSITORY_FORMAT_INIT \
+{ \
+	.version = -1, \
+	.is_bare = -1, \
+	.hash_algo = GIT_HASH_SHA1, \
+	.unknown_extensions = STRING_LIST_INIT_DUP, \
+}
+
 /*
  * Read the repository format characteristics from the config file "path" into
- * "format" struct. Returns the numeric version. On error, -1 is returned,
- * format->version is set to -1, and all other fields in the struct are
- * undefined.
+ * "format" struct. Returns the numeric version. On error, or if no version is
+ * found in the configuration, -1 is returned, format->version is set to -1,
+ * and all other fields in the struct are set to the default configuration
+ * (REPOSITORY_FORMAT_INIT). Always initialize the struct using
+ * REPOSITORY_FORMAT_INIT before calling this function.
  */
 int read_repository_format(struct repository_format *format, const char *path);
 
+/*
+ * Free the memory held onto by `format`, but not the struct itself.
+ * (No need to use this after `read_repository_format()` fails.)
+ */
+void clear_repository_format(struct repository_format *format);
+
 /*
  * Verify that the repository described by repository_format is something we
  * can read. If it is, return 0. Otherwise, return -1, and "err" will describe
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 41faffd28d..04c60eaad5 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -96,7 +96,7 @@ static void copy_templates(const char *template_dir)
 	struct strbuf path = STRBUF_INIT;
 	struct strbuf template_path = STRBUF_INIT;
 	size_t template_len;
-	struct repository_format template_format;
+	struct repository_format template_format = REPOSITORY_FORMAT_INIT;
 	struct strbuf err = STRBUF_INIT;
 	DIR *dir;
 	char *to_free = NULL;
@@ -148,6 +148,7 @@ static void copy_templates(const char *template_dir)
 	free(to_free);
 	strbuf_release(&path);
 	strbuf_release(&template_path);
+	clear_repository_format(&template_format);
 }
 
 static int git_init_db_config(const char *k, const char *v, void *cb)
diff --git a/repository.c b/repository.c
index 7b02e1dffa..df88705574 100644
--- a/repository.c
+++ b/repository.c
@@ -148,7 +148,7 @@ int repo_init(struct repository *repo,
 	      const char *gitdir,
 	      const char *worktree)
 {
-	struct repository_format format;
+	struct repository_format format = REPOSITORY_FORMAT_INIT;
 	memset(repo, 0, sizeof(*repo));
 
 	repo->objects = raw_object_store_new();
@@ -165,6 +165,7 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	clear_repository_format(&format);
 	return 0;
 
 error:
diff --git a/setup.c b/setup.c
index bb633942bb..ab6e8fd4c3 100644
--- a/setup.c
+++ b/setup.c
@@ -477,7 +477,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	repository_format_partial_clone = candidate->partial_clone;
+	repository_format_partial_clone = xstrdup_or_null(candidate->partial_clone);
 	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 
@@ -500,27 +500,38 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 		}
 		if (candidate->work_tree) {
 			free(git_work_tree_cfg);
-			git_work_tree_cfg = candidate->work_tree;
+			git_work_tree_cfg = xstrdup(candidate->work_tree);
 			inside_work_tree = -1;
 		}
-	} else {
-		free(candidate->work_tree);
 	}
 
 	return 0;
 }
 
+static void init_repository_format(struct repository_format *format)
+{
+	const struct repository_format fresh = REPOSITORY_FORMAT_INIT;
+
+	memcpy(format, &fresh, sizeof(fresh));
+}
+
 int read_repository_format(struct repository_format *format, const char *path)
 {
-	memset(format, 0, sizeof(*format));
-	format->version = -1;
-	format->is_bare = -1;
-	format->hash_algo = GIT_HASH_SHA1;
-	string_list_init(&format->unknown_extensions, 1);
+	clear_repository_format(format);
 	git_config_from_file(check_repo_format, path, format);
+	if (format->version == -1)
+		clear_repository_format(format);
 	return format->version;
 }
 
+void clear_repository_format(struct repository_format *format)
+{
+	string_list_clear(&format->unknown_extensions, 0);
+	free(format->work_tree);
+	free(format->partial_clone);
+	init_repository_format(format);
+}
+
 int verify_repository_format(const struct repository_format *format,
 			     struct strbuf *err)
 {
@@ -1008,7 +1019,7 @@ int discover_git_directory(struct strbuf *commondir,
 	struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT;
 	size_t gitdir_offset = gitdir->len, cwd_len;
 	size_t commondir_offset = commondir->len;
-	struct repository_format candidate;
+	struct repository_format candidate = REPOSITORY_FORMAT_INIT;
 
 	if (strbuf_getcwd(&dir))
 		return -1;
@@ -1045,9 +1056,11 @@ int discover_git_directory(struct strbuf *commondir,
 		strbuf_release(&err);
 		strbuf_setlen(commondir, commondir_offset);
 		strbuf_setlen(gitdir, gitdir_offset);
+		clear_repository_format(&candidate);
 		return -1;
 	}
 
+	clear_repository_format(&candidate);
 	return 0;
 }
 
@@ -1056,7 +1069,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 	static struct strbuf cwd = STRBUF_INIT;
 	struct strbuf dir = STRBUF_INIT, gitdir = STRBUF_INIT;
 	const char *prefix;
-	struct repository_format repo_fmt;
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 
 	/*
 	 * We may have read an incomplete configuration before
@@ -1146,6 +1159,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 
 	strbuf_release(&dir);
 	strbuf_release(&gitdir);
+	clear_repository_format(&repo_fmt);
 
 	return prefix;
 }
@@ -1203,9 +1217,10 @@ int git_config_perm(const char *var, const char *value)
 
 void check_repository_format(void)
 {
-	struct repository_format repo_fmt;
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 	check_repository_format_gently(get_git_dir(), &repo_fmt, NULL);
 	startup_info->have_repository = 1;
+	clear_repository_format(&repo_fmt);
 }
 
 /*
diff --git a/worktree.c b/worktree.c
index d6a0ee7f73..b45bfeb9d3 100644
--- a/worktree.c
+++ b/worktree.c
@@ -444,7 +444,7 @@ int submodule_uses_worktrees(const char *path)
 	DIR *dir;
 	struct dirent *d;
 	int ret = 0;
-	struct repository_format format;
+	struct repository_format format = REPOSITORY_FORMAT_INIT;
 
 	submodule_gitdir = git_pathdup_submodule(path, "%s", "");
 	if (!submodule_gitdir)
@@ -462,8 +462,10 @@ int submodule_uses_worktrees(const char *path)
 	read_repository_format(&format, sb.buf);
 	if (format.version != 0) {
 		strbuf_release(&sb);
+		clear_repository_format(&format);
 		return 1;
 	}
+	clear_repository_format(&format);
 
 	/* Replace config by worktrees. */
 	strbuf_setlen(&sb, sb.len - strlen("config"));
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/2] setup: fix memory leaks with `struct repository_format`
  2019-02-28 20:36                           ` [PATCH v4 0/2] " Martin Ågren
  2019-02-28 20:36                             ` [PATCH v4 1/2] setup: free old value before setting `work_tree` Martin Ågren
  2019-02-28 20:36                             ` [PATCH v4 2/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
@ 2019-03-06  4:56                             ` Jeff King
  2 siblings, 0 replies; 40+ messages in thread
From: Jeff King @ 2019-03-06  4:56 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, brian m . carlson

On Thu, Feb 28, 2019 at 09:36:26PM +0100, Martin Ågren wrote:

> This is a follow-up to v3 [1] from about a month ago. Patch 1 is
> unchanged; patch 2 provides some additional documentation of the
> initialization that is required, plus I've gotten rid of the compound
> literal. Range-diff below.

Thanks, this version looks good to me.

-Peff

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2019-03-06  4:56 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18  7:25 [PATCH 0/3] setup: add `clear_repository_format()` Martin Ågren
2018-12-18  7:25 ` [PATCH 1/3] setup: drop return value from `read_repository_format()` Martin Ågren
2018-12-19 15:27   ` Jeff King
2018-12-19 21:42     ` Martin Ågren
2018-12-20  0:17     ` brian m. carlson
2018-12-20  2:52       ` Jeff King
2018-12-20  3:45         ` brian m. carlson
2018-12-20 14:53           ` Jeff King
2018-12-18  7:25 ` [PATCH 2/3] setup: do not use invalid `repository_format` Martin Ågren
2018-12-19  0:18   ` brian m. carlson
2018-12-19 21:43     ` Martin Ågren
2018-12-19 15:38   ` Jeff King
2018-12-19 21:46     ` Martin Ågren
2018-12-19 23:17       ` Jeff King
2018-12-20  0:21     ` brian m. carlson
2018-12-18  7:25 ` [PATCH 3/3] setup: add `clear_repository_format()` Martin Ågren
2018-12-19 15:48   ` Jeff King
2018-12-19 21:49     ` Martin Ågren
2019-01-14 18:34 ` [PATCH v2 0/3] " Martin Ågren
2019-01-14 18:34   ` [PATCH v2 1/3] setup: free old value before setting `work_tree` Martin Ågren
2019-01-14 18:34   ` [PATCH v2 2/3] setup: do not use invalid `repository_format` Martin Ågren
2019-01-15 19:31     ` Jeff King
2019-01-17  6:31       ` Martin Ågren
2019-01-22  7:07         ` Jeff King
2019-01-22 13:34           ` Martin Ågren
2019-01-22 21:45             ` [PATCH v3 0/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
2019-01-22 21:45               ` [PATCH v3 1/2] setup: free old value before setting `work_tree` Martin Ågren
2019-01-22 21:45               ` [PATCH v3 2/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
2019-01-23  5:57                 ` Jeff King
2019-01-24  0:14                   ` brian m. carlson
2019-01-25 19:25                     ` Martin Ågren
2019-01-25 19:24                   ` Martin Ågren
2019-01-25 19:51                     ` Jeff King
2019-02-25 19:21                       ` Martin Ågren
2019-02-26 17:46                         ` Jeff King
2019-02-28 20:36                           ` [PATCH v4 0/2] " Martin Ågren
2019-02-28 20:36                             ` [PATCH v4 1/2] setup: free old value before setting `work_tree` Martin Ågren
2019-02-28 20:36                             ` [PATCH v4 2/2] setup: fix memory leaks with `struct repository_format` Martin Ågren
2019-03-06  4:56                             ` [PATCH v4 0/2] " Jeff King
2019-01-14 18:34   ` [PATCH v2 3/3] setup: add `clear_repository_format()` Martin Ågren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).