git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] fixes for running the test suite with --valgrind
@ 2017-10-03 19:57 Thomas Gummerer
  2017-10-03 19:57 ` [PATCH 1/3] path.c: fix uninitialized memory access Thomas Gummerer
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Thomas Gummerer @ 2017-10-03 19:57 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer

I recently tried to run the git test suite with --valgrind.
Unfortunately it didn't come back completely clean.  This patch series
fixes a bunch of these errors, although unfortunately not quite all of
them yet.

The remaining failures are in

t0021.15 - This one is not actually valgrind complaining about
something, but the clean/smudge script not writing debug.log for some
reason.  I'm not quite sure what exactly is going on here.
t0021.25, t0021.26 - This is an actual uninitialized memory usage:

==4751== Conditional jump or move depends on uninitialised value(s)
==4751==    at 0x2796D5: fill_stat_cache_info (read-cache.c:153)
==4751==    by 0x2218A2: write_entry (entry.c:359)
==4751==    by 0x221D42: checkout_entry (entry.c:458)
==4751==    by 0x2EB627: check_updates (unpack-trees.c:382)
==4751==    by 0x2EDBA1: unpack_trees (unpack-trees.c:1380)
==4751==    by 0x13797E: checkout (clone.c:750)
==4751==    by 0x138FF4: cmd_clone (clone.c:1194)
==4751==    by 0x11A6F2: run_builtin (git.c:342)
==4751==    by 0x11A9E8: handle_builtin (git.c:550)
==4751==    by 0x11ABCC: run_argv (git.c:602)
==4751==    by 0x11AD8E: cmd_main (git.c:679)
==4751==    by 0x1BF125: main (common-main.c:43)
==4751==  Uninitialised value was created by a stack allocation
==4751==    at 0x2212B4: write_entry (entry.c:254)
==4751== 

So far I've tracked this one down to the lstat call in write_entry
failing, and thus not filling struct stat_info.  I'm not quite sure
yet about the best workaround for that (and I'm not very familiar with
the clean/smudge code).  I'll keep digging what the problem there is.

There's also one test that's unexpectedly passing when the test suite
is run under valgrind (t6410.) but I haven't dug into what's up with
that yet.

These patches could be applied by themselves as well, but since they
work toward the same goal, and a cover letter would explain where
these are coming from I decided to make them into a patch series.

Thomas Gummerer (3):
  path.c: fix uninitialized memory access
  http-push: fix construction of hex value from path
  sub-process: allocate argv on the heap

 http-push.c   |  2 +-
 path.c        | 19 ++++++++++---------
 sub-process.c |  4 +++-
 3 files changed, 14 insertions(+), 11 deletions(-)

-- 
2.14.1.480.gb18f417b89


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/3] path.c: fix uninitialized memory access
  2017-10-03 19:57 [PATCH 0/3] fixes for running the test suite with --valgrind Thomas Gummerer
@ 2017-10-03 19:57 ` Thomas Gummerer
  2017-10-03 22:45   ` Jonathan Nieder
  2017-10-03 19:57 ` [PATCH 2/3] http-push: fix construction of hex value from path Thomas Gummerer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Thomas Gummerer @ 2017-10-03 19:57 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy,
	Junio C Hamano

In cleanup_path we're passing in a char array, run a memcmp on it, and
run through it without ever checking if something is in the array in the
first place.  This can lead us to access uninitialized memory, for
example in t5541-http-push-smart.sh test 7, when run under valgrind:

==4423== Conditional jump or move depends on uninitialised value(s)
==4423==    at 0x242FA9: cleanup_path (path.c:35)
==4423==    by 0x242FA9: mkpath (path.c:456)
==4423==    by 0x256CC7: refname_match (refs.c:364)
==4423==    by 0x26C181: count_refspec_match (remote.c:1015)
==4423==    by 0x26C181: match_explicit_lhs (remote.c:1126)
==4423==    by 0x26C181: check_push_refs (remote.c:1409)
==4423==    by 0x2ABB4D: transport_push (transport.c:870)
==4423==    by 0x186703: push_with_options (push.c:332)
==4423==    by 0x18746D: do_push (push.c:409)
==4423==    by 0x18746D: cmd_push (push.c:566)
==4423==    by 0x1183E0: run_builtin (git.c:352)
==4423==    by 0x11973E: handle_builtin (git.c:539)
==4423==    by 0x11973E: run_argv (git.c:593)
==4423==    by 0x11973E: main (git.c:698)
==4423==  Uninitialised value was created by a heap allocation
==4423==    at 0x4C2CD8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4423==    by 0x4C2F195: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4423==    by 0x2C196B: xrealloc (wrapper.c:137)
==4423==    by 0x29A30B: strbuf_grow (strbuf.c:66)
==4423==    by 0x29A30B: strbuf_vaddf (strbuf.c:277)
==4423==    by 0x242F9F: mkpath (path.c:454)
==4423==    by 0x256CC7: refname_match (refs.c:364)
==4423==    by 0x26C181: count_refspec_match (remote.c:1015)
==4423==    by 0x26C181: match_explicit_lhs (remote.c:1126)
==4423==    by 0x26C181: check_push_refs (remote.c:1409)
==4423==    by 0x2ABB4D: transport_push (transport.c:870)
==4423==    by 0x186703: push_with_options (push.c:332)
==4423==    by 0x18746D: do_push (push.c:409)
==4423==    by 0x18746D: cmd_push (push.c:566)
==4423==    by 0x1183E0: run_builtin (git.c:352)
==4423==    by 0x11973E: handle_builtin (git.c:539)
==4423==    by 0x11973E: run_argv (git.c:593)
==4423==    by 0x11973E: main (git.c:698)
==4423==

Avoid this by checking passing in the length of the string in the char
array, and checking that we never run over it.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 path.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/path.c b/path.c
index b533ec938d..12f41ee877 100644
--- a/path.c
+++ b/path.c
@@ -34,20 +34,21 @@ static struct strbuf *get_pathname(void)
 	return sb;
 }
 
-static char *cleanup_path(char *path)
+static char *cleanup_path(char *path, int len)
 {
 	/* Clean it up */
-	if (!memcmp(path, "./", 2)) {
-		path += 2;
-		while (*path == '/')
-			path++;
+	int skip = 0;
+	if (len >= 2 && !memcmp(path, "./", 2)) {
+		skip += 2;
+		while (skip < len && *(path + skip) == '/')
+			skip++;
 	}
-	return path;
+	return path + skip;
 }
 
 static void strbuf_cleanup_path(struct strbuf *sb)
 {
-	char *path = cleanup_path(sb->buf);
+	char *path = cleanup_path(sb->buf, sb->len);
 	if (path > sb->buf)
 		strbuf_remove(sb, 0, path - sb->buf);
 }
@@ -64,7 +65,7 @@ char *mksnpath(char *buf, size_t n, const char *fmt, ...)
 		strlcpy(buf, bad_path, n);
 		return buf;
 	}
-	return cleanup_path(buf);
+	return cleanup_path(buf, n);
 }
 
 static int dir_prefix(const char *buf, const char *dir)
@@ -494,7 +495,7 @@ const char *mkpath(const char *fmt, ...)
 	va_start(args, fmt);
 	strbuf_vaddf(pathname, fmt, args);
 	va_end(args);
-	return cleanup_path(pathname->buf);
+	return cleanup_path(pathname->buf, pathname->len);
 }
 
 const char *worktree_git_path(const struct worktree *wt, const char *fmt, ...)
-- 
2.14.1.480.gb18f417b89


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] http-push: fix construction of hex value from path
  2017-10-03 19:57 [PATCH 0/3] fixes for running the test suite with --valgrind Thomas Gummerer
  2017-10-03 19:57 ` [PATCH 1/3] path.c: fix uninitialized memory access Thomas Gummerer
@ 2017-10-03 19:57 ` Thomas Gummerer
  2017-10-03 22:53   ` Jonathan Nieder
  2017-10-03 19:57 ` [PATCH 3/3] sub-process: allocate argv on the heap Thomas Gummerer
  2017-10-03 23:41 ` [PATCH 0/3] fixes for running the test suite with --valgrind Jeff King
  3 siblings, 1 reply; 30+ messages in thread
From: Thomas Gummerer @ 2017-10-03 19:57 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Junio C Hamano, brian m. carlson, Jeff King

The get_oid_hex_from_objpath takes care of creating a oid from a
pathname.  It does this by memcpy'ing the first two bytes of the path to
the "hex" string, then skipping the '/', and then copying the rest of the
path to the "hex" string.  Currently it fails to increase the pointer to
the hex string, so the second memcpy invocation just mashes over what
was copied in the first one, and leaves the last two bytes in the string
uninitialized.

This breaks valgrind in t5540, although the test passes without
valgrind:

==5490== Use of uninitialised value of size 8
==5490==    at 0x13C6B5: hexval (cache.h:1238)
==5490==    by 0x13C6DB: hex2chr (cache.h:1247)
==5490==    by 0x13C734: get_sha1_hex (hex.c:42)
==5490==    by 0x13C78E: get_oid_hex (hex.c:53)
==5490==    by 0x118BDA: get_oid_hex_from_objpath (http-push.c:1023)
==5490==    by 0x118C92: process_ls_object (http-push.c:1038)
==5490==    by 0x118E5B: handle_remote_ls_ctx (http-push.c:1077)
==5490==    by 0x118227: xml_end_tag (http-push.c:815)
==5490==    by 0x50C1448: ??? (in /usr/lib/libexpat.so.1.6.6)
==5490==    by 0x50C221B: ??? (in /usr/lib/libexpat.so.1.6.6)
==5490==    by 0x50BFBF2: ??? (in /usr/lib/libexpat.so.1.6.6)
==5490==    by 0x50C0B24: ??? (in /usr/lib/libexpat.so.1.6.6)
==5490==  Uninitialised value was created by a stack allocation
==5490==    at 0x118B63: get_oid_hex_from_objpath (http-push.c:1012)
==5490==

Fix this by correctly incrementing the pointer to the "hex" variable, so
the first two bytes are left untouched by the memcpy call, and the last
two bytes are correctly initialized.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 http-push.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/http-push.c b/http-push.c
index e4c9b065ce..e9a01ec4da 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1018,7 +1018,7 @@ static int get_oid_hex_from_objpath(const char *path, struct object_id *oid)
 	memcpy(hex, path, 2);
 	path += 2;
 	path++; /* skip '/' */
-	memcpy(hex, path, GIT_SHA1_HEXSZ - 2);
+	memcpy(hex + 2, path, GIT_SHA1_HEXSZ - 2);
 
 	return get_oid_hex(hex, oid);
 }
-- 
2.14.1.480.gb18f417b89


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] sub-process: allocate argv on the heap
  2017-10-03 19:57 [PATCH 0/3] fixes for running the test suite with --valgrind Thomas Gummerer
  2017-10-03 19:57 ` [PATCH 1/3] path.c: fix uninitialized memory access Thomas Gummerer
  2017-10-03 19:57 ` [PATCH 2/3] http-push: fix construction of hex value from path Thomas Gummerer
@ 2017-10-03 19:57 ` Thomas Gummerer
  2017-10-03 20:24   ` Johannes Sixt
  2017-10-03 20:25   ` Stefan Beller
  2017-10-03 23:41 ` [PATCH 0/3] fixes for running the test suite with --valgrind Jeff King
  3 siblings, 2 replies; 30+ messages in thread
From: Thomas Gummerer @ 2017-10-03 19:57 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Lars Schneider, Ben Peart, Ben Peart,
	Junio C Hamano, Jakub Narębski

Currently the argv is only allocated on the stack, and then assigned to
process->argv.  When the start_subprocess function goes out of scope,
the local argv variable is eliminated from the stack, but the pointer is
still kept around in process->argv.

Much later when we try to access the same process->argv in
finish_command, this leads us to access a memory location that no longer
contains what we want.  As argv0 is only used for printing errors, this
is not easily noticed in normal git operations.  However when running
t0021-conversion.sh through valgrind, valgrind rightfully complains:

==21024== Invalid read of size 8
==21024==    at 0x2ACF64: finish_command (run-command.c:869)
==21024==    by 0x2D6B18: subprocess_exit_handler (sub-process.c:72)
==21024==    by 0x2AB41E: cleanup_children (run-command.c:45)
==21024==    by 0x2AB526: cleanup_children_on_exit (run-command.c:81)
==21024==    by 0x54AD487: __run_exit_handlers (in /usr/lib/libc-2.26.so)
==21024==    by 0x54AD4D9: exit (in /usr/lib/libc-2.26.so)
==21024==    by 0x11A9EF: handle_builtin (git.c:550)
==21024==    by 0x11ABCC: run_argv (git.c:602)
==21024==    by 0x11AD8E: cmd_main (git.c:679)
==21024==    by 0x1BF125: main (common-main.c:43)
==21024==  Address 0x1ffeffec00 is on thread 1's stack
==21024==  1504 bytes below stack pointer
==21024==

Fix this by allocating the memory on properly on the heap.  This memory
is allocated on the heap, and never free'd.  However the same seems to be
true for struct child_process, so it should be fine to just let the
memory be free'd when the process terminates.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 sub-process.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sub-process.c b/sub-process.c
index 6dde5062be..4680af8193 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -77,7 +77,9 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
 {
 	int err;
 	struct child_process *process;
-	const char *argv[] = { cmd, NULL };
+	const char **argv = xmalloc(2 * sizeof(char *));
+	argv[0] = cmd;
+	argv[1] = NULL;
 
 	entry->cmd = cmd;
 	process = &entry->process;
-- 
2.14.1.480.gb18f417b89


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] sub-process: allocate argv on the heap
  2017-10-03 19:57 ` [PATCH 3/3] sub-process: allocate argv on the heap Thomas Gummerer
@ 2017-10-03 20:24   ` Johannes Sixt
  2017-10-04  4:59     ` Junio C Hamano
  2017-10-03 20:25   ` Stefan Beller
  1 sibling, 1 reply; 30+ messages in thread
From: Johannes Sixt @ 2017-10-03 20:24 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Lars Schneider, Ben Peart, Ben Peart, Junio C Hamano,
	Jakub Narębski

Am 03.10.2017 um 21:57 schrieb Thomas Gummerer:
> diff --git a/sub-process.c b/sub-process.c
> index 6dde5062be..4680af8193 100644
> --- a/sub-process.c
> +++ b/sub-process.c
> @@ -77,7 +77,9 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
>   {
>   	int err;
>   	struct child_process *process;
> -	const char *argv[] = { cmd, NULL };
> +	const char **argv = xmalloc(2 * sizeof(char *));
> +	argv[0] = cmd;
> +	argv[1] = NULL;
>   
>   	entry->cmd = cmd;
>   	process = &entry->process;
> 

Perhaps this should become

	argv_array_push(&process->args, cmd);

so that there is no new memory leak?

-- Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] sub-process: allocate argv on the heap
  2017-10-03 19:57 ` [PATCH 3/3] sub-process: allocate argv on the heap Thomas Gummerer
  2017-10-03 20:24   ` Johannes Sixt
@ 2017-10-03 20:25   ` Stefan Beller
  1 sibling, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2017-10-03 20:25 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git@vger.kernel.org, Lars Schneider, Ben Peart, Ben Peart,
	Junio C Hamano, Jakub Narębski

On Tue, Oct 3, 2017 at 12:57 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Currently the argv is only allocated on the stack, and then assigned to
> process->argv.  When the start_subprocess function goes out of scope,
> the local argv variable is eliminated from the stack, but the pointer is
> still kept around in process->argv.
>
> Much later when we try to access the same process->argv in
> finish_command, this leads us to access a memory location that no longer
> contains what we want.  As argv0 is only used for printing errors, this
> is not easily noticed in normal git operations.  However when running
> t0021-conversion.sh through valgrind, valgrind rightfully complains:
>
> ==21024== Invalid read of size 8
> ==21024==    at 0x2ACF64: finish_command (run-command.c:869)
> ==21024==    by 0x2D6B18: subprocess_exit_handler (sub-process.c:72)
> ==21024==    by 0x2AB41E: cleanup_children (run-command.c:45)
> ==21024==    by 0x2AB526: cleanup_children_on_exit (run-command.c:81)
> ==21024==    by 0x54AD487: __run_exit_handlers (in /usr/lib/libc-2.26.so)
> ==21024==    by 0x54AD4D9: exit (in /usr/lib/libc-2.26.so)
> ==21024==    by 0x11A9EF: handle_builtin (git.c:550)
> ==21024==    by 0x11ABCC: run_argv (git.c:602)
> ==21024==    by 0x11AD8E: cmd_main (git.c:679)
> ==21024==    by 0x1BF125: main (common-main.c:43)
> ==21024==  Address 0x1ffeffec00 is on thread 1's stack
> ==21024==  1504 bytes below stack pointer
> ==21024==
>
> Fix this by allocating the memory on properly on the heap.  This memory
> is allocated on the heap, and never free'd.  However the same seems to be
> true for struct child_process, so it should be fine to just let the
> memory be free'd when the process terminates.

Uh. :( The broken window theory at work.

The patch below seems correct, but as you eluded to, now we'd be
leaking memory. The run_command API has two fields 'char **argv'
and 'argv_array args'. The argv is kept around for historical reasons
as well as when the caller wants to be in control of the array (the caller
needs to free the memory, but could also just reuse it for a slightly
different invocation), whereas the args argument is owned by the child
process, such that the memory is freed by finish_command.

As we're doing a memory allocation now anyway, how about:

-       const char *argv[] = { cmd, NULL };
...
    child_process_init(process);
+    argv_array_push(process.args, cmd);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] path.c: fix uninitialized memory access
  2017-10-03 19:57 ` [PATCH 1/3] path.c: fix uninitialized memory access Thomas Gummerer
@ 2017-10-03 22:45   ` Jonathan Nieder
  2017-10-03 23:30     ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Jonathan Nieder @ 2017-10-03 22:45 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, Nguyễn Thái Ngọc Duy, Junio C Hamano

Hi,

Thomas Gummerer wrote:

> In cleanup_path we're passing in a char array, run a memcmp on it, and
> run through it without ever checking if something is in the array in the
> first place.  This can lead us to access uninitialized memory, for
> example in t5541-http-push-smart.sh test 7, when run under valgrind:
[...]
> Avoid this by checking passing in the length of the string in the char
> array, and checking that we never run over it.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  path.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)

When I first read the above, I thought it was going to be about a
NUL-terminated string that was missing a NUL.  But in fact, the issue
is that strlen(path) can be < 2.

In other words, an alternative fix would be

	if (*path == '.' && path[1] == '/') {
		...
	}

which would not require passing in 'len' or switching to index-based
arithmetic.  I think I prefer it.  What do you think?

Thanks and hope that helps,
Jonathan

diff --git i/path.c w/path.c
index b533ec938d..3a1fbee1e0 100644
--- i/path.c
+++ w/path.c
@@ -37,7 +37,7 @@ static struct strbuf *get_pathname(void)
 static char *cleanup_path(char *path)
 {
 	/* Clean it up */
-	if (!memcmp(path, "./", 2)) {
+	if (*path == '.' && path[1] == '/') {
 		path += 2;
 		while (*path == '/')
 			path++;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] http-push: fix construction of hex value from path
  2017-10-03 19:57 ` [PATCH 2/3] http-push: fix construction of hex value from path Thomas Gummerer
@ 2017-10-03 22:53   ` Jonathan Nieder
  2017-10-03 23:36     ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Jonathan Nieder @ 2017-10-03 22:53 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git, Junio C Hamano, brian m. carlson, Jeff King

Hi,

Thomas Gummerer wrote:

> The get_oid_hex_from_objpath takes care of creating a oid from a
> pathname.  It does this by memcpy'ing the first two bytes of the path to
> the "hex" string, then skipping the '/', and then copying the rest of the
> path to the "hex" string.  Currently it fails to increase the pointer to
> the hex string, so the second memcpy invocation just mashes over what
> was copied in the first one, and leaves the last two bytes in the string
> uninitialized.

Wow.  The fix is obviously correct.

> This breaks valgrind in t5540, although the test passes without
> valgrind:
[...]
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  http-push.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Would it be straightforward to add a correctness test for this?  It
seems like this code path didn't work at all and no one noticed.

This is the code path in http-push.c which says

 /*
  * NEEDSWORK: remote_ls() ignores info/refs on the remote side.  But it
  * should _only_ heed the information from that file, instead of trying to
  * determine the refs from the remote file system (badly: it does not even
  * know about packed-refs).
  */
 static void remote_ls(const char *path, int flags,

I think the problem is that when it fails, we end up thinking that
there are *fewer* objects than are actually present remotely so the
only ill effect is pushing too much.  So this should be observable in
server logs (i.e. it is testable) but it's not a catastrophic failure
which means it's harder to test than it would be otherwise.

Moreover, this is in the webdav-based "dumb http" push code path,
which I do not trust much at all.  I wonder if we could retire it
completely (or at least provide an option to turn it off).

> diff --git a/http-push.c b/http-push.c
> index e4c9b065ce..e9a01ec4da 100644
> --- a/http-push.c
> +++ b/http-push.c
> @@ -1018,7 +1018,7 @@ static int get_oid_hex_from_objpath(const char *path, struct object_id *oid)
>  	memcpy(hex, path, 2);
>  	path += 2;
>  	path++; /* skip '/' */
> -	memcpy(hex, path, GIT_SHA1_HEXSZ - 2);
> +	memcpy(hex + 2, path, GIT_SHA1_HEXSZ - 2);
>  
>  	return get_oid_hex(hex, oid);

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] path.c: fix uninitialized memory access
  2017-10-03 22:45   ` Jonathan Nieder
@ 2017-10-03 23:30     ` Jeff King
  2017-10-03 23:37       ` Jonathan Nieder
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-10-03 23:30 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Thomas Gummerer, git, Nguyễn Thái Ngọc Duy,
	Junio C Hamano

On Tue, Oct 03, 2017 at 03:45:01PM -0700, Jonathan Nieder wrote:

> When I first read the above, I thought it was going to be about a
> NUL-terminated string that was missing a NUL.  But in fact, the issue
> is that strlen(path) can be < 2.
> 
> In other words, an alternative fix would be
> 
> 	if (*path == '.' && path[1] == '/') {
> 		...
> 	}
> 
> which would not require passing in 'len' or switching to index-based
> arithmetic.  I think I prefer it.  What do you think?

Yes, I think that approach is much nicer. I think you could even use
skip_prefix. Unfortunately you have to play a few games with const-ness,
but I think the resulting signature for cleanup_path() is an
improvement:

diff --git a/path.c b/path.c
index 00ec04e7a5..2e09a7bce0 100644
--- a/path.c
+++ b/path.c
@@ -34,11 +34,10 @@ static struct strbuf *get_pathname(void)
 	return sb;
 }
 
-static char *cleanup_path(char *path)
+static const char *cleanup_path(const char *path)
 {
 	/* Clean it up */
-	if (!memcmp(path, "./", 2)) {
-		path += 2;
+	if (skip_prefix(path, "./", &path)) {
 		while (*path == '/')
 			path++;
 	}
@@ -47,7 +46,7 @@ static char *cleanup_path(char *path)
 
 static void strbuf_cleanup_path(struct strbuf *sb)
 {
-	char *path = cleanup_path(sb->buf);
+	const char *path = cleanup_path(sb->buf);
 	if (path > sb->buf)
 		strbuf_remove(sb, 0, path - sb->buf);
 }
@@ -64,7 +63,7 @@ char *mksnpath(char *buf, size_t n, const char *fmt, ...)
 		strlcpy(buf, bad_path, n);
 		return buf;
 	}
-	return cleanup_path(buf);
+	return (char *)cleanup_path(buf);
 }
 
 static int dir_prefix(const char *buf, const char *dir)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] http-push: fix construction of hex value from path
  2017-10-03 22:53   ` Jonathan Nieder
@ 2017-10-03 23:36     ` Jeff King
  2017-10-04  4:48       ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-10-03 23:36 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Thomas Gummerer, git, Junio C Hamano, brian m. carlson

On Tue, Oct 03, 2017 at 03:53:15PM -0700, Jonathan Nieder wrote:

> Thomas Gummerer wrote:
> 
> > The get_oid_hex_from_objpath takes care of creating a oid from a
> > pathname.  It does this by memcpy'ing the first two bytes of the path to
> > the "hex" string, then skipping the '/', and then copying the rest of the
> > path to the "hex" string.  Currently it fails to increase the pointer to
> > the hex string, so the second memcpy invocation just mashes over what
> > was copied in the first one, and leaves the last two bytes in the string
> > uninitialized.
> 
> Wow.  The fix is obviously correct.

Agreed. This is brown-paper-bag worthy. Thanks, Thomas, for cleaning up
my mess.

> I think the problem is that when it fails, we end up thinking that
> there are *fewer* objects than are actually present remotely so the
> only ill effect is pushing too much.  So this should be observable in
> server logs (i.e. it is testable) but it's not a catastrophic failure
> which means it's harder to test than it would be otherwise.

And thank you, Jonathan, for this analysis. I had also wondered how such
a frequently-triggered bug could have gone completely unnoticed, but
this explains it.

> Moreover, this is in the webdav-based "dumb http" push code path,
> which I do not trust much at all.  I wonder if we could retire it
> completely (or at least provide an option to turn it off).

I would really like that, too. It has been the cause of a lot of pain
when working with the smart code, and I am not at all surprised to find
a bug of this magnitude lurking in it. I'd _hoped_ this could show that
the system has been unusably broken for years, which would give us
confidence to turn it off. :) But per your paragraph above, people could
very easily still have been happily using it in the meantime.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] path.c: fix uninitialized memory access
  2017-10-03 23:30     ` Jeff King
@ 2017-10-03 23:37       ` Jonathan Nieder
  2017-10-04  4:47         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jonathan Nieder @ 2017-10-03 23:37 UTC (permalink / raw)
  To: Jeff King
  Cc: Thomas Gummerer, git, Nguyễn Thái Ngọc Duy,
	Junio C Hamano

Jeff King wrote:
> On Tue, Oct 03, 2017 at 03:45:01PM -0700, Jonathan Nieder wrote:

>> In other words, an alternative fix would be
>> 
>> 	if (*path == '.' && path[1] == '/') {
>> 		...
>> 	}
>> 
>> which would not require passing in 'len' or switching to index-based
>> arithmetic.  I think I prefer it.  What do you think?
>
> Yes, I think that approach is much nicer. I think you could even use
> skip_prefix. Unfortunately you have to play a few games with const-ness,
> but I think the resulting signature for cleanup_path() is an
> improvement:

Ooh!

For what it's worth, if you add a commit message with Thomas's
Reported-by then this lgtm.

Thanks,
Jonathan

> diff --git a/path.c b/path.c
> index 00ec04e7a5..2e09a7bce0 100644
> --- a/path.c
> +++ b/path.c
> @@ -34,11 +34,10 @@ static struct strbuf *get_pathname(void)
>  	return sb;
>  }
>  
> -static char *cleanup_path(char *path)
> +static const char *cleanup_path(const char *path)
>  {
>  	/* Clean it up */
> -	if (!memcmp(path, "./", 2)) {
> -		path += 2;
> +	if (skip_prefix(path, "./", &path)) {
>  		while (*path == '/')
>  			path++;
>  	}
> @@ -47,7 +46,7 @@ static char *cleanup_path(char *path)
>  
>  static void strbuf_cleanup_path(struct strbuf *sb)
>  {
> -	char *path = cleanup_path(sb->buf);
> +	const char *path = cleanup_path(sb->buf);
>  	if (path > sb->buf)
>  		strbuf_remove(sb, 0, path - sb->buf);
>  }
> @@ -64,7 +63,7 @@ char *mksnpath(char *buf, size_t n, const char *fmt, ...)
>  		strlcpy(buf, bad_path, n);
>  		return buf;
>  	}
> -	return cleanup_path(buf);
> +	return (char *)cleanup_path(buf);
>  }
>  
>  static int dir_prefix(const char *buf, const char *dir)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] fixes for running the test suite with --valgrind
  2017-10-03 19:57 [PATCH 0/3] fixes for running the test suite with --valgrind Thomas Gummerer
                   ` (2 preceding siblings ...)
  2017-10-03 19:57 ` [PATCH 3/3] sub-process: allocate argv on the heap Thomas Gummerer
@ 2017-10-03 23:41 ` Jeff King
  2017-10-03 23:50   ` Jonathan Nieder
  2017-10-04 10:19   ` playing with MSan, was " Jeff King
  3 siblings, 2 replies; 30+ messages in thread
From: Jeff King @ 2017-10-03 23:41 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

On Tue, Oct 03, 2017 at 08:57:10PM +0100, Thomas Gummerer wrote:

> I recently tried to run the git test suite with --valgrind.
> Unfortunately it didn't come back completely clean.  This patch series
> fixes a bunch of these errors, although unfortunately not quite all of
> them yet.

Thanks for trying that. I've largely switched to ASan over valgrind
because it runs _so_ much faster. Unfortunately it seems to miss
these uninitialized-memory cases.

I think using SANITIZE=memory would catch these, but it needs some
suppressions tuning. The weird "zlib reads uninitialized memory" error
is a problem (valgrind sees this, too, but we have suppressions).

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] fixes for running the test suite with --valgrind
  2017-10-03 23:41 ` [PATCH 0/3] fixes for running the test suite with --valgrind Jeff King
@ 2017-10-03 23:50   ` Jonathan Nieder
  2017-10-03 23:54     ` Jeff King
  2017-10-04 10:19   ` playing with MSan, was " Jeff King
  1 sibling, 1 reply; 30+ messages in thread
From: Jonathan Nieder @ 2017-10-03 23:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Gummerer, git

Jeff King wrote:

> I think using SANITIZE=memory would catch these, but it needs some
> suppressions tuning. The weird "zlib reads uninitialized memory" error
> is a problem (valgrind sees this, too, but we have suppressions).

What version of zlib do you use?  I've heard some good things about
v1.2.11 improving matters, though I haven't checked yet.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] fixes for running the test suite with --valgrind
  2017-10-03 23:50   ` Jonathan Nieder
@ 2017-10-03 23:54     ` Jeff King
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-10-03 23:54 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Thomas Gummerer, git

On Tue, Oct 03, 2017 at 04:50:58PM -0700, Jonathan Nieder wrote:

> > I think using SANITIZE=memory would catch these, but it needs some
> > suppressions tuning. The weird "zlib reads uninitialized memory" error
> > is a problem (valgrind sees this, too, but we have suppressions).
> 
> What version of zlib do you use?  I've heard some good things about
> v1.2.11 improving matters, though I haven't checked yet.

1.2.8.dfsg-5, from Debian unstable. I'm happy to try v1.2.11 once it
hits unstable or experimental. :)

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] path.c: fix uninitialized memory access
  2017-10-03 23:37       ` Jonathan Nieder
@ 2017-10-04  4:47         ` Junio C Hamano
  2017-10-04  5:21           ` Jeff King
                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Junio C Hamano @ 2017-10-04  4:47 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Jeff King, Thomas Gummerer, git,
	Nguyễn Thái Ngọc Duy

Jonathan Nieder <jrnieder@gmail.com> writes:

> Jeff King wrote:
>> On Tue, Oct 03, 2017 at 03:45:01PM -0700, Jonathan Nieder wrote:
>
>>> In other words, an alternative fix would be
>>> 
>>> 	if (*path == '.' && path[1] == '/') {
>>> 		...
>>> 	}
>>> 
>>> which would not require passing in 'len' or switching to index-based
>>> arithmetic.  I think I prefer it.  What do you think?
>>
>> Yes, I think that approach is much nicer. I think you could even use
>> skip_prefix. Unfortunately you have to play a few games with const-ness,
>> but I think the resulting signature for cleanup_path() is an
>> improvement:

To tie the loose end, here is what I'll queue.

-- >8 --
From: Jeff King <peff@peff.net>
Date: Tue, 3 Oct 2017 19:30:40 -0400
Subject: [PATCH] path.c: fix uninitialized memory access

In cleanup_path we're passing in a char array, run a memcmp on it, and
run through it without ever checking if something is in the array in the
first place.  This can lead us to access uninitialized memory, for
example in t5541-http-push-smart.sh test 7, when run under valgrind:

==4423== Conditional jump or move depends on uninitialised value(s)
==4423==    at 0x242FA9: cleanup_path (path.c:35)
==4423==    by 0x242FA9: mkpath (path.c:456)
==4423==    by 0x256CC7: refname_match (refs.c:364)
==4423==    by 0x26C181: count_refspec_match (remote.c:1015)
==4423==    by 0x26C181: match_explicit_lhs (remote.c:1126)
==4423==    by 0x26C181: check_push_refs (remote.c:1409)
==4423==    by 0x2ABB4D: transport_push (transport.c:870)
==4423==    by 0x186703: push_with_options (push.c:332)
==4423==    by 0x18746D: do_push (push.c:409)
==4423==    by 0x18746D: cmd_push (push.c:566)
==4423==    by 0x1183E0: run_builtin (git.c:352)
==4423==    by 0x11973E: handle_builtin (git.c:539)
==4423==    by 0x11973E: run_argv (git.c:593)
==4423==    by 0x11973E: main (git.c:698)
==4423==  Uninitialised value was created by a heap allocation
==4423==    at 0x4C2CD8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4423==    by 0x4C2F195: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4423==    by 0x2C196B: xrealloc (wrapper.c:137)
==4423==    by 0x29A30B: strbuf_grow (strbuf.c:66)
==4423==    by 0x29A30B: strbuf_vaddf (strbuf.c:277)
==4423==    by 0x242F9F: mkpath (path.c:454)
==4423==    by 0x256CC7: refname_match (refs.c:364)
==4423==    by 0x26C181: count_refspec_match (remote.c:1015)
==4423==    by 0x26C181: match_explicit_lhs (remote.c:1126)
==4423==    by 0x26C181: check_push_refs (remote.c:1409)
==4423==    by 0x2ABB4D: transport_push (transport.c:870)
==4423==    by 0x186703: push_with_options (push.c:332)
==4423==    by 0x18746D: do_push (push.c:409)
==4423==    by 0x18746D: cmd_push (push.c:566)
==4423==    by 0x1183E0: run_builtin (git.c:352)
==4423==    by 0x11973E: handle_builtin (git.c:539)
==4423==    by 0x11973E: run_argv (git.c:593)
==4423==    by 0x11973E: main (git.c:698)
==4423==

Avoid this by using skip_prefix(), which knows not to go beyond the
end of the string.

Reported-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 path.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/path.c b/path.c
index e50d2befcf..2fecf854fe 100644
--- a/path.c
+++ b/path.c
@@ -33,11 +33,10 @@ static struct strbuf *get_pathname(void)
 	return sb;
 }
 
-static char *cleanup_path(char *path)
+static const char *cleanup_path(const char *path)
 {
 	/* Clean it up */
-	if (!memcmp(path, "./", 2)) {
-		path += 2;
+	if (skip_prefix(path, "./", &path)) {
 		while (*path == '/')
 			path++;
 	}
@@ -46,7 +45,7 @@ static char *cleanup_path(char *path)
 
 static void strbuf_cleanup_path(struct strbuf *sb)
 {
-	char *path = cleanup_path(sb->buf);
+	const char *path = cleanup_path(sb->buf);
 	if (path > sb->buf)
 		strbuf_remove(sb, 0, path - sb->buf);
 }
@@ -63,7 +62,7 @@ char *mksnpath(char *buf, size_t n, const char *fmt, ...)
 		strlcpy(buf, bad_path, n);
 		return buf;
 	}
-	return cleanup_path(buf);
+	return (char *)cleanup_path(buf);
 }
 
 static int dir_prefix(const char *buf, const char *dir)
-- 
2.14.2-889-gd2948f6aa6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] http-push: fix construction of hex value from path
  2017-10-03 23:36     ` Jeff King
@ 2017-10-04  4:48       ` Junio C Hamano
  2017-10-04  5:20         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2017-10-04  4:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Thomas Gummerer, git, brian m. carlson

Jeff King <peff@peff.net> writes:

>> Moreover, this is in the webdav-based "dumb http" push code path,
>> which I do not trust much at all.  I wonder if we could retire it
>> completely (or at least provide an option to turn it off).
>
> I would really like that, too. It has been the cause of a lot of pain
> when working with the smart code, and I am not at all surprised to find
> a bug of this magnitude lurking in it. I'd _hoped_ this could show that
> the system has been unusably broken for years, which would give us
> confidence to turn it off. :) But per your paragraph above, people could
> very easily still have been happily using it in the meantime.

Same here.  Perhaps we should deliberately and silently break it and
see who screams?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] sub-process: allocate argv on the heap
  2017-10-03 20:24   ` Johannes Sixt
@ 2017-10-04  4:59     ` Junio C Hamano
  2017-10-04  5:32       ` Jeff King
                         ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Junio C Hamano @ 2017-10-04  4:59 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Thomas Gummerer, git, Lars Schneider, Ben Peart, Ben Peart,
	Jakub Narębski

Johannes Sixt <j6t@kdbg.org> writes:

> Am 03.10.2017 um 21:57 schrieb Thomas Gummerer:
>> diff --git a/sub-process.c b/sub-process.c
>> index 6dde5062be..4680af8193 100644
>> --- a/sub-process.c
>> +++ b/sub-process.c
>> @@ -77,7 +77,9 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
>>   {
>>   	int err;
>>   	struct child_process *process;
>> -	const char *argv[] = { cmd, NULL };
>> +	const char **argv = xmalloc(2 * sizeof(char *));
>> +	argv[0] = cmd;
>> +	argv[1] = NULL;
>>     	entry->cmd = cmd;
>>   	process = &entry->process;
>>
>
> Perhaps this should become
>
> 	argv_array_push(&process->args, cmd);
>
> so that there is no new memory leak?

Sounds like a good idea (if I am not grossly mistaken as to what is
being suggested).

Here is what I am planning to queue.

-- >8 --
From: Johannes Sixt <j6t@kdbg.org>
Date: Tue, 3 Oct 2017 22:24:57 +0200
Subject: [PATCH] sub-process: use child_process.args instead of child_process.argv

Currently the argv is only allocated on the stack, and then assigned to
process->argv.  When the start_subprocess function goes out of scope,
the local argv variable is eliminated from the stack, but the pointer is
still kept around in process->argv.

Much later when we try to access the same process->argv in
finish_command, this leads us to access a memory location that no longer
contains what we want.  As argv0 is only used for printing errors, this
is not easily noticed in normal git operations.  However when running
t0021-conversion.sh through valgrind, valgrind rightfully complains:

==21024== Invalid read of size 8
==21024==    at 0x2ACF64: finish_command (run-command.c:869)
==21024==    by 0x2D6B18: subprocess_exit_handler (sub-process.c:72)
==21024==    by 0x2AB41E: cleanup_children (run-command.c:45)
==21024==    by 0x2AB526: cleanup_children_on_exit (run-command.c:81)
==21024==    by 0x54AD487: __run_exit_handlers (in /usr/lib/libc-2.26.so)
==21024==    by 0x54AD4D9: exit (in /usr/lib/libc-2.26.so)
==21024==    by 0x11A9EF: handle_builtin (git.c:550)
==21024==    by 0x11ABCC: run_argv (git.c:602)
==21024==    by 0x11AD8E: cmd_main (git.c:679)
==21024==    by 0x1BF125: main (common-main.c:43)
==21024==  Address 0x1ffeffec00 is on thread 1's stack
==21024==  1504 bytes below stack pointer
==21024==

These days, the child_process structure has its own args array, and
the standard way to set up its argv[] is to use that one, instead of
assigning to process->argv to point at an array that is outside.
Use that facility automatically fixes this issue.

Reported-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 sub-process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sub-process.c b/sub-process.c
index fcc4832c14..648b3a3943 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -74,13 +74,12 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
 {
 	int err;
 	struct child_process *process;
-	const char *argv[] = { cmd, NULL };
 
 	entry->cmd = cmd;
 	process = &entry->process;
 
 	child_process_init(process);
-	process->argv = argv;
+	argv_array_push(&process->args, cmd);
 	process->use_shell = 1;
 	process->in = -1;
 	process->out = -1;
-- 
2.14.2-889-gd2948f6aa6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] http-push: fix construction of hex value from path
  2017-10-04  4:48       ` Junio C Hamano
@ 2017-10-04  5:20         ` Junio C Hamano
  2017-10-04  5:26           ` Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2017-10-04  5:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Thomas Gummerer, git, brian m. carlson

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>>> Moreover, this is in the webdav-based "dumb http" push code path,
>>> which I do not trust much at all.  I wonder if we could retire it
>>> completely (or at least provide an option to turn it off).
>>
>> I would really like that, too. It has been the cause of a lot of pain
>> when working with the smart code, and I am not at all surprised to find
>> a bug of this magnitude lurking in it. I'd _hoped_ this could show that
>> the system has been unusably broken for years, which would give us
>> confidence to turn it off. :) But per your paragraph above, people could
>> very easily still have been happily using it in the meantime.
>
> Same here.  Perhaps we should deliberately and silently break it and
> see who screams?

Hopefully it should be obvious but just for people with unreasonable
expectations, I should clarify that the above needs a smiley ;-).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] path.c: fix uninitialized memory access
  2017-10-04  4:47         ` Junio C Hamano
@ 2017-10-04  5:21           ` Jeff King
  2017-10-04 19:22           ` Thomas Gummerer
  2017-10-04 19:36           ` Jonathan Nieder
  2 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-10-04  5:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Thomas Gummerer, git,
	Nguyễn Thái Ngọc Duy

On Wed, Oct 04, 2017 at 01:47:29PM +0900, Junio C Hamano wrote:

> Jonathan Nieder <jrnieder@gmail.com> writes:
> 
> > Jeff King wrote:
> >> On Tue, Oct 03, 2017 at 03:45:01PM -0700, Jonathan Nieder wrote:
> >
> >>> In other words, an alternative fix would be
> >>> 
> >>> 	if (*path == '.' && path[1] == '/') {
> >>> 		...
> >>> 	}
> >>> 
> >>> which would not require passing in 'len' or switching to index-based
> >>> arithmetic.  I think I prefer it.  What do you think?
> >>
> >> Yes, I think that approach is much nicer. I think you could even use
> >> skip_prefix. Unfortunately you have to play a few games with const-ness,
> >> but I think the resulting signature for cleanup_path() is an
> >> improvement:
> 
> To tie the loose end, here is what I'll queue.

Thank you, I was planning to get to this later tonight, but now I don't
have to. :)

FWIW, I wondered if we could get rid of the extra cast by switching
cleanup_path() to return an offset rather than a string (which also
makes its interface more foolproof, since it's clear that the return
value is a subset of the original string).

But it ends up being a bit clunky I think (patch below for reference).

I guess it's possible that `cleanup_path` could learn to do other
cleanup, too (e.g., to clean up doubled slashes in the middle of the
path), in which case it really would want a non-const buffer. But since
it has remained unchanged since 26c8a533af (Add "mkpath()" helper
function, 2005-07-08), I'm happy to assume it will remain so for another
12 years.

All of which is to say:

> -- >8 --
> From: Jeff King <peff@peff.net>
> Date: Tue, 3 Oct 2017 19:30:40 -0400
> Subject: [PATCH] path.c: fix uninitialized memory access

Looks good to me.

-Peff

-- >8 --
diff --git a/path.c b/path.c
index 5aa9244eb2..eaeb9d9a17 100644
--- a/path.c
+++ b/path.c
@@ -34,22 +34,24 @@ static struct strbuf *get_pathname(void)
 	return sb;
 }
 
-static char *cleanup_path(char *path)
+static size_t cleanup_path(const char *path)
 {
-	/* Clean it up */
-	if (!memcmp(path, "./", 2)) {
-		path += 2;
-		while (*path == '/')
-			path++;
-	}
-	return path;
+	const char *clean;
+
+	if (!skip_prefix(path, "./", &clean))
+		return 0;
+
+	while (*clean == '/')
+		clean++;
+
+	return clean - path;
 }
 
 static void strbuf_cleanup_path(struct strbuf *sb)
 {
-	char *path = cleanup_path(sb->buf);
-	if (path > sb->buf)
-		strbuf_remove(sb, 0, path - sb->buf);
+	size_t s = cleanup_path(sb->buf);
+	if (s)
+		strbuf_remove(sb, 0, s);
 }
 
 char *mksnpath(char *buf, size_t n, const char *fmt, ...)
@@ -64,7 +66,7 @@ char *mksnpath(char *buf, size_t n, const char *fmt, ...)
 		strlcpy(buf, bad_path, n);
 		return buf;
 	}
-	return cleanup_path(buf);
+	return buf + cleanup_path(buf);
 }
 
 static int dir_prefix(const char *buf, const char *dir)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] http-push: fix construction of hex value from path
  2017-10-04  5:20         ` Junio C Hamano
@ 2017-10-04  5:26           ` Jeff King
  2017-10-04  6:26             ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-10-04  5:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Thomas Gummerer, git, brian m. carlson

On Wed, Oct 04, 2017 at 02:20:05PM +0900, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Jeff King <peff@peff.net> writes:
> >
> >>> Moreover, this is in the webdav-based "dumb http" push code path,
> >>> which I do not trust much at all.  I wonder if we could retire it
> >>> completely (or at least provide an option to turn it off).
> >>
> >> I would really like that, too. It has been the cause of a lot of pain
> >> when working with the smart code, and I am not at all surprised to find
> >> a bug of this magnitude lurking in it. I'd _hoped_ this could show that
> >> the system has been unusably broken for years, which would give us
> >> confidence to turn it off. :) But per your paragraph above, people could
> >> very easily still have been happily using it in the meantime.
> >
> > Same here.  Perhaps we should deliberately and silently break it and
> > see who screams?
> 
> Hopefully it should be obvious but just for people with unreasonable
> expectations, I should clarify that the above needs a smiley ;-).

Yes, I was surprised to see it without one. :)

More seriously, is there any interest in marking it as deprecated in the
release notes and issuing a warning when it's used for a few cycles?

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] sub-process: allocate argv on the heap
  2017-10-04  4:59     ` Junio C Hamano
@ 2017-10-04  5:32       ` Jeff King
  2017-10-04  5:58       ` Johannes Sixt
  2017-10-04 19:31       ` Thomas Gummerer
  2 siblings, 0 replies; 30+ messages in thread
From: Jeff King @ 2017-10-04  5:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, Thomas Gummerer, git, Lars Schneider, Ben Peart,
	Ben Peart, Jakub Narębski

On Wed, Oct 04, 2017 at 01:59:31PM +0900, Junio C Hamano wrote:

> > Perhaps this should become
> >
> > 	argv_array_push(&process->args, cmd);
> >
> > so that there is no new memory leak?
> 
> Sounds like a good idea (if I am not grossly mistaken as to what is
> being suggested).
> 
> Here is what I am planning to queue.
> 
> -- >8 --
> From: Johannes Sixt <j6t@kdbg.org>
> Date: Tue, 3 Oct 2017 22:24:57 +0200
> Subject: [PATCH] sub-process: use child_process.args instead of child_process.argv

This looks good (and is exactly the type of case for which I added
"args" to the child_process in the first place). The commit message
is well-explained and the patch looks obviously correct.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] sub-process: allocate argv on the heap
  2017-10-04  4:59     ` Junio C Hamano
  2017-10-04  5:32       ` Jeff King
@ 2017-10-04  5:58       ` Johannes Sixt
  2017-10-04 19:31       ` Thomas Gummerer
  2 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2017-10-04  5:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Thomas Gummerer, git, Lars Schneider, Ben Peart, Ben Peart,
	Jakub Narębski

Am 04.10.2017 um 06:59 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>> Am 03.10.2017 um 21:57 schrieb Thomas Gummerer:
>>> diff --git a/sub-process.c b/sub-process.c
>>> index 6dde5062be..4680af8193 100644
>>> --- a/sub-process.c
>>> +++ b/sub-process.c
>>> @@ -77,7 +77,9 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
>>>    {
>>>    	int err;
>>>    	struct child_process *process;
>>> -	const char *argv[] = { cmd, NULL };
>>> +	const char **argv = xmalloc(2 * sizeof(char *));
>>> +	argv[0] = cmd;
>>> +	argv[1] = NULL;
>>>      	entry->cmd = cmd;
>>>    	process = &entry->process;
>>>
>>
>> Perhaps this should become
>>
>> 	argv_array_push(&process->args, cmd);
>>
>> so that there is no new memory leak?
> 
> Sounds like a good idea (if I am not grossly mistaken as to what is
> being suggested).
> 
> Here is what I am planning to queue.
> 
> -- >8 --
> From: Johannes Sixt <j6t@kdbg.org>
> Date: Tue, 3 Oct 2017 22:24:57 +0200
> Subject: [PATCH] sub-process: use child_process.args instead of child_process.argv
> 
> Currently the argv is only allocated on the stack, and then assigned to
> process->argv.  When the start_subprocess function goes out of scope,
> the local argv variable is eliminated from the stack, but the pointer is
> still kept around in process->argv.
> 
> Much later when we try to access the same process->argv in
> finish_command, this leads us to access a memory location that no longer
> contains what we want.  As argv0 is only used for printing errors, this
> is not easily noticed in normal git operations.  However when running
> t0021-conversion.sh through valgrind, valgrind rightfully complains:
> 
> ==21024== Invalid read of size 8
> ==21024==    at 0x2ACF64: finish_command (run-command.c:869)
> ==21024==    by 0x2D6B18: subprocess_exit_handler (sub-process.c:72)
> ==21024==    by 0x2AB41E: cleanup_children (run-command.c:45)
> ==21024==    by 0x2AB526: cleanup_children_on_exit (run-command.c:81)
> ==21024==    by 0x54AD487: __run_exit_handlers (in /usr/lib/libc-2.26.so)
> ==21024==    by 0x54AD4D9: exit (in /usr/lib/libc-2.26.so)
> ==21024==    by 0x11A9EF: handle_builtin (git.c:550)
> ==21024==    by 0x11ABCC: run_argv (git.c:602)
> ==21024==    by 0x11AD8E: cmd_main (git.c:679)
> ==21024==    by 0x1BF125: main (common-main.c:43)
> ==21024==  Address 0x1ffeffec00 is on thread 1's stack
> ==21024==  1504 bytes below stack pointer
> ==21024==
> 
> These days, the child_process structure has its own args array, and
> the standard way to set up its argv[] is to use that one, instead of
> assigning to process->argv to point at an array that is outside.
> Use that facility automatically fixes this issue.
> 
> Reported-by: Thomas Gummerer <t.gummerer@gmail.com>
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>   sub-process.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/sub-process.c b/sub-process.c
> index fcc4832c14..648b3a3943 100644
> --- a/sub-process.c
> +++ b/sub-process.c
> @@ -74,13 +74,12 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
>   {
>   	int err;
>   	struct child_process *process;
> -	const char *argv[] = { cmd, NULL };
>   
>   	entry->cmd = cmd;
>   	process = &entry->process;
>   
>   	child_process_init(process);
> -	process->argv = argv;
> +	argv_array_push(&process->args, cmd);
>   	process->use_shell = 1;
>   	process->in = -1;
>   	process->out = -1;
> 

Thank you very much! That looks good. Just to be on the safe side:

Signed-off-by: Johannes Sixt <j6t@kdbg.org>

-- Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] http-push: fix construction of hex value from path
  2017-10-04  5:26           ` Jeff King
@ 2017-10-04  6:26             ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2017-10-04  6:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Thomas Gummerer, git, brian m. carlson

Jeff King <peff@peff.net> writes:

> More seriously, is there any interest in marking it as deprecated in the
> release notes and issuing a warning when it's used for a few cycles?

No objection from me.

I do not object with such a well designed deprecation plan for any
other features, if there are other "favourite" ones people would
want to deprecate and eventually remove, by the way.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* playing with MSan, was Re: [PATCH 0/3] fixes for running the test suite with --valgrind
  2017-10-03 23:41 ` [PATCH 0/3] fixes for running the test suite with --valgrind Jeff King
  2017-10-03 23:50   ` Jonathan Nieder
@ 2017-10-04 10:19   ` Jeff King
  2017-10-04 19:30     ` Thomas Gummerer
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-10-04 10:19 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Jonathan Nieder, git

On Tue, Oct 03, 2017 at 07:41:54PM -0400, Jeff King wrote:

> I think using SANITIZE=memory would catch these, but it needs some
> suppressions tuning. The weird "zlib reads uninitialized memory" error
> is a problem (valgrind sees this, too, but we have suppressions).

I dug into this a little more. You can blacklist certain functions from
getting MSan treatment, but that's not quite what we want. We want to
mark bytes from certain _sources_ as being initialized, even if MSan
doesn't agree.

And indeed, you can do that. As far as I can tell, MSan works by keeping
a shadow map of memory and setting flags when it believes it has been
initialized, and then checking that map when we make decisions based on
the memory. But it can only do that if it instruments all writes. So the
MSan documentation recommends that you build _everything_, including
libraries, with it. Which obviously we don't do if we're using a system
zlib. Or a system libc for that matter (though they intercept many
common libc functions to handle this).

So one strategy is to "cheat" a bit at the library interfaces, and claim
whatever they send us is properly initialized. The patch below tries
that with zlib, and it does seem to work. It would fail to notice a real
problem with any input we send _to_ the library (since the library isn't
instrumented, and we claim that whatever comes out of it is legitimate).
I could probably live with that.

But there are quite a few test failures that would still need
investigating and annotating:

  - Certainly it's confused by looking at regmatch_t results from
    regexec(). We can fix that by building with NO_REGEX. But pcre has
    a similar problem.

  - Ditto curl and openssl, whose exit points would need annotations.

  - For some reason test-sigchain segfaults when it raise()s in the
    signal handler and recurses. Not sure if this is an MSan bug or
    what.

So I dunno. This approach is a _lot_ more convenient than trying to
rebuild all the dependencies from scratch, and it runs way faster than
valgrind. It did find the cases that led to the patches in this
series, and at least one more: if the lstat() at the end of
entry.c:write_entry() fails, we write nonsense into the cache_entry.

I think we could probably get it to zero false positives without _too_
much effort. I'll stop here for tonight, but I may pick it up again
later (of course anybody else is welcome to fool around with it, too).

Below is the patch that let me run:

  make SANITIZE=memory CC=clang-6.0 NO_REGEX=1

and get a tractable number of errors.

-- >8 --
diff --git a/Makefile b/Makefile
index b143e4eea3..1da5c01211 100644
--- a/Makefile
+++ b/Makefile
@@ -1047,6 +1047,9 @@ endif
 ifneq ($(filter leak,$(SANITIZERS)),)
 BASIC_CFLAGS += -DSUPPRESS_ANNOTATED_LEAKS
 endif
+ifneq ($(filter memory,$(SANITIZERS)),)
+BASIC_CFLAGS += -DENABLE_MSAN_UNPOISON
+endif
 endif
 
 ifndef sysconfdir
diff --git a/git-compat-util.h b/git-compat-util.h
index cedad4d581..836a4c0b54 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1191,4 +1191,11 @@ extern void unleak_memory(const void *ptr, size_t len);
 #define UNLEAK(var) do {} while (0)
 #endif
 
+#ifdef ENABLE_MSAN_UNPOISON
+#include <sanitizer/msan_interface.h>
+#define msan_unpoison(ptr, len) __msan_unpoison(ptr, len)
+#else
+#define msan_unpoison(ptr, len) do {} while (0)
+#endif
+
 #endif
diff --git a/zlib.c b/zlib.c
index 4223f1a8c5..5fa8f12507 100644
--- a/zlib.c
+++ b/zlib.c
@@ -56,6 +56,8 @@ static void zlib_post_call(git_zstream *s)
 	if (s->z.total_in != s->total_in + bytes_consumed)
 		die("BUG: total_in mismatch");
 
+	msan_unpoison(s->next_out, bytes_produced);
+
 	s->total_out = s->z.total_out;
 	s->total_in = s->z.total_in;
 	s->next_in = s->z.next_in;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] path.c: fix uninitialized memory access
  2017-10-04  4:47         ` Junio C Hamano
  2017-10-04  5:21           ` Jeff King
@ 2017-10-04 19:22           ` Thomas Gummerer
  2017-10-04 19:36           ` Jonathan Nieder
  2 siblings, 0 replies; 30+ messages in thread
From: Thomas Gummerer @ 2017-10-04 19:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Jeff King, git,
	Nguyễn Thái Ngọc Duy

On 10/04, Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
> 
> > Jeff King wrote:
> >> On Tue, Oct 03, 2017 at 03:45:01PM -0700, Jonathan Nieder wrote:
> >
> >>> In other words, an alternative fix would be
> >>> 
> >>> 	if (*path == '.' && path[1] == '/') {
> >>> 		...
> >>> 	}
> >>> 
> >>> which would not require passing in 'len' or switching to index-based
> >>> arithmetic.  I think I prefer it.  What do you think?
> >>
> >> Yes, I think that approach is much nicer. I think you could even use
> >> skip_prefix. Unfortunately you have to play a few games with const-ness,
> >> but I think the resulting signature for cleanup_path() is an
> >> improvement:
> 
> To tie the loose end, here is what I'll queue.

Thanks.  This is much nicer indeed!

> -- >8 --
> From: Jeff King <peff@peff.net>
> Date: Tue, 3 Oct 2017 19:30:40 -0400
> Subject: [PATCH] path.c: fix uninitialized memory access
> 
> In cleanup_path we're passing in a char array, run a memcmp on it, and
> run through it without ever checking if something is in the array in the
> first place.  This can lead us to access uninitialized memory, for
> example in t5541-http-push-smart.sh test 7, when run under valgrind:
> 
> ==4423== Conditional jump or move depends on uninitialised value(s)
> ==4423==    at 0x242FA9: cleanup_path (path.c:35)
> ==4423==    by 0x242FA9: mkpath (path.c:456)
> ==4423==    by 0x256CC7: refname_match (refs.c:364)
> ==4423==    by 0x26C181: count_refspec_match (remote.c:1015)
> ==4423==    by 0x26C181: match_explicit_lhs (remote.c:1126)
> ==4423==    by 0x26C181: check_push_refs (remote.c:1409)
> ==4423==    by 0x2ABB4D: transport_push (transport.c:870)
> ==4423==    by 0x186703: push_with_options (push.c:332)
> ==4423==    by 0x18746D: do_push (push.c:409)
> ==4423==    by 0x18746D: cmd_push (push.c:566)
> ==4423==    by 0x1183E0: run_builtin (git.c:352)
> ==4423==    by 0x11973E: handle_builtin (git.c:539)
> ==4423==    by 0x11973E: run_argv (git.c:593)
> ==4423==    by 0x11973E: main (git.c:698)
> ==4423==  Uninitialised value was created by a heap allocation
> ==4423==    at 0x4C2CD8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==4423==    by 0x4C2F195: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==4423==    by 0x2C196B: xrealloc (wrapper.c:137)
> ==4423==    by 0x29A30B: strbuf_grow (strbuf.c:66)
> ==4423==    by 0x29A30B: strbuf_vaddf (strbuf.c:277)
> ==4423==    by 0x242F9F: mkpath (path.c:454)
> ==4423==    by 0x256CC7: refname_match (refs.c:364)
> ==4423==    by 0x26C181: count_refspec_match (remote.c:1015)
> ==4423==    by 0x26C181: match_explicit_lhs (remote.c:1126)
> ==4423==    by 0x26C181: check_push_refs (remote.c:1409)
> ==4423==    by 0x2ABB4D: transport_push (transport.c:870)
> ==4423==    by 0x186703: push_with_options (push.c:332)
> ==4423==    by 0x18746D: do_push (push.c:409)
> ==4423==    by 0x18746D: cmd_push (push.c:566)
> ==4423==    by 0x1183E0: run_builtin (git.c:352)
> ==4423==    by 0x11973E: handle_builtin (git.c:539)
> ==4423==    by 0x11973E: run_argv (git.c:593)
> ==4423==    by 0x11973E: main (git.c:698)
> ==4423==
> 
> Avoid this by using skip_prefix(), which knows not to go beyond the
> end of the string.
> 
> Reported-by: Thomas Gummerer <t.gummerer@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  path.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/path.c b/path.c
> index e50d2befcf..2fecf854fe 100644
> --- a/path.c
> +++ b/path.c
> @@ -33,11 +33,10 @@ static struct strbuf *get_pathname(void)
>  	return sb;
>  }
>  
> -static char *cleanup_path(char *path)
> +static const char *cleanup_path(const char *path)
>  {
>  	/* Clean it up */
> -	if (!memcmp(path, "./", 2)) {
> -		path += 2;
> +	if (skip_prefix(path, "./", &path)) {
>  		while (*path == '/')
>  			path++;
>  	}
> @@ -46,7 +45,7 @@ static char *cleanup_path(char *path)
>  
>  static void strbuf_cleanup_path(struct strbuf *sb)
>  {
> -	char *path = cleanup_path(sb->buf);
> +	const char *path = cleanup_path(sb->buf);
>  	if (path > sb->buf)
>  		strbuf_remove(sb, 0, path - sb->buf);
>  }
> @@ -63,7 +62,7 @@ char *mksnpath(char *buf, size_t n, const char *fmt, ...)
>  		strlcpy(buf, bad_path, n);
>  		return buf;
>  	}
> -	return cleanup_path(buf);
> +	return (char *)cleanup_path(buf);
>  }
>  
>  static int dir_prefix(const char *buf, const char *dir)
> -- 
> 2.14.2-889-gd2948f6aa6
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: playing with MSan, was Re: [PATCH 0/3] fixes for running the test suite with --valgrind
  2017-10-04 10:19   ` playing with MSan, was " Jeff King
@ 2017-10-04 19:30     ` Thomas Gummerer
  2017-10-05  3:46       ` lstat-ing delayed-filter output, was Re: playing with MSan Jeff King
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gummerer @ 2017-10-04 19:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, git

On 10/04, Jeff King wrote:
> On Tue, Oct 03, 2017 at 07:41:54PM -0400, Jeff King wrote:
> 
> > I think using SANITIZE=memory would catch these, but it needs some
> > suppressions tuning. The weird "zlib reads uninitialized memory" error
> > is a problem (valgrind sees this, too, but we have suppressions).
> 
> I dug into this a little more. You can blacklist certain functions from
> getting MSan treatment, but that's not quite what we want. We want to
> mark bytes from certain _sources_ as being initialized, even if MSan
> doesn't agree.
> 
> And indeed, you can do that. As far as I can tell, MSan works by keeping
> a shadow map of memory and setting flags when it believes it has been
> initialized, and then checking that map when we make decisions based on
> the memory. But it can only do that if it instruments all writes. So the
> MSan documentation recommends that you build _everything_, including
> libraries, with it. Which obviously we don't do if we're using a system
> zlib. Or a system libc for that matter (though they intercept many
> common libc functions to handle this).
> 
> So one strategy is to "cheat" a bit at the library interfaces, and claim
> whatever they send us is properly initialized. The patch below tries
> that with zlib, and it does seem to work. It would fail to notice a real
> problem with any input we send _to_ the library (since the library isn't
> instrumented, and we claim that whatever comes out of it is legitimate).
> I could probably live with that.
> 
> But there are quite a few test failures that would still need
> investigating and annotating:
> 
>   - Certainly it's confused by looking at regmatch_t results from
>     regexec(). We can fix that by building with NO_REGEX. But pcre has
>     a similar problem.
> 
>   - Ditto curl and openssl, whose exit points would need annotations.
> 
>   - For some reason test-sigchain segfaults when it raise()s in the
>     signal handler and recurses. Not sure if this is an MSan bug or
>     what.
> 
> So I dunno. This approach is a _lot_ more convenient than trying to
> rebuild all the dependencies from scratch, and it runs way faster than
> valgrind.  It did find the cases that led to the patches in this
> series, and at least one more: if the lstat() at the end of
> entry.c:write_entry() fails, we write nonsense into the cache_entry.

Yeah valgrind found that one too, as I tried (and apparently failed :))
to explain in the cover letter.  I just haven't found the time yet to
actually try and go fix that one.

> I think we could probably get it to zero false positives without _too_
> much effort. I'll stop here for tonight, but I may pick it up again
> later (of course anybody else is welcome to fool around with it, too).
> 
> Below is the patch that let me run:
> 
>   make SANITIZE=memory CC=clang-6.0 NO_REGEX=1
> 
> and get a tractable number of errors.
> 
> -- >8 --
> diff --git a/Makefile b/Makefile
> index b143e4eea3..1da5c01211 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1047,6 +1047,9 @@ endif
>  ifneq ($(filter leak,$(SANITIZERS)),)
>  BASIC_CFLAGS += -DSUPPRESS_ANNOTATED_LEAKS
>  endif
> +ifneq ($(filter memory,$(SANITIZERS)),)
> +BASIC_CFLAGS += -DENABLE_MSAN_UNPOISON
> +endif
>  endif
>  
>  ifndef sysconfdir
> diff --git a/git-compat-util.h b/git-compat-util.h
> index cedad4d581..836a4c0b54 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1191,4 +1191,11 @@ extern void unleak_memory(const void *ptr, size_t len);
>  #define UNLEAK(var) do {} while (0)
>  #endif
>  
> +#ifdef ENABLE_MSAN_UNPOISON
> +#include <sanitizer/msan_interface.h>
> +#define msan_unpoison(ptr, len) __msan_unpoison(ptr, len)
> +#else
> +#define msan_unpoison(ptr, len) do {} while (0)
> +#endif
> +
>  #endif
> diff --git a/zlib.c b/zlib.c
> index 4223f1a8c5..5fa8f12507 100644
> --- a/zlib.c
> +++ b/zlib.c
> @@ -56,6 +56,8 @@ static void zlib_post_call(git_zstream *s)
>  	if (s->z.total_in != s->total_in + bytes_consumed)
>  		die("BUG: total_in mismatch");
>  
> +	msan_unpoison(s->next_out, bytes_produced);
> +
>  	s->total_out = s->z.total_out;
>  	s->total_in = s->z.total_in;
>  	s->next_in = s->z.next_in;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] sub-process: allocate argv on the heap
  2017-10-04  4:59     ` Junio C Hamano
  2017-10-04  5:32       ` Jeff King
  2017-10-04  5:58       ` Johannes Sixt
@ 2017-10-04 19:31       ` Thomas Gummerer
  2 siblings, 0 replies; 30+ messages in thread
From: Thomas Gummerer @ 2017-10-04 19:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, git, Lars Schneider, Ben Peart, Ben Peart,
	Jakub Narębski

On 10/04, Junio C Hamano wrote:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
> > Am 03.10.2017 um 21:57 schrieb Thomas Gummerer:
> >> diff --git a/sub-process.c b/sub-process.c
> >> index 6dde5062be..4680af8193 100644
> >> --- a/sub-process.c
> >> +++ b/sub-process.c
> >> @@ -77,7 +77,9 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
> >>   {
> >>   	int err;
> >>   	struct child_process *process;
> >> -	const char *argv[] = { cmd, NULL };
> >> +	const char **argv = xmalloc(2 * sizeof(char *));
> >> +	argv[0] = cmd;
> >> +	argv[1] = NULL;
> >>     	entry->cmd = cmd;
> >>   	process = &entry->process;
> >>
> >
> > Perhaps this should become
> >
> > 	argv_array_push(&process->args, cmd);
> >
> > so that there is no new memory leak?
> 
> Sounds like a good idea (if I am not grossly mistaken as to what is
> being suggested).
> 
> Here is what I am planning to queue.
> 
> -- >8 --
> From: Johannes Sixt <j6t@kdbg.org>
> Date: Tue, 3 Oct 2017 22:24:57 +0200
> Subject: [PATCH] sub-process: use child_process.args instead of child_process.argv
> 
> Currently the argv is only allocated on the stack, and then assigned to
> process->argv.  When the start_subprocess function goes out of scope,
> the local argv variable is eliminated from the stack, but the pointer is
> still kept around in process->argv.
> 
> Much later when we try to access the same process->argv in
> finish_command, this leads us to access a memory location that no longer
> contains what we want.  As argv0 is only used for printing errors, this
> is not easily noticed in normal git operations.  However when running
> t0021-conversion.sh through valgrind, valgrind rightfully complains:
> 
> ==21024== Invalid read of size 8
> ==21024==    at 0x2ACF64: finish_command (run-command.c:869)
> ==21024==    by 0x2D6B18: subprocess_exit_handler (sub-process.c:72)
> ==21024==    by 0x2AB41E: cleanup_children (run-command.c:45)
> ==21024==    by 0x2AB526: cleanup_children_on_exit (run-command.c:81)
> ==21024==    by 0x54AD487: __run_exit_handlers (in /usr/lib/libc-2.26.so)
> ==21024==    by 0x54AD4D9: exit (in /usr/lib/libc-2.26.so)
> ==21024==    by 0x11A9EF: handle_builtin (git.c:550)
> ==21024==    by 0x11ABCC: run_argv (git.c:602)
> ==21024==    by 0x11AD8E: cmd_main (git.c:679)
> ==21024==    by 0x1BF125: main (common-main.c:43)
> ==21024==  Address 0x1ffeffec00 is on thread 1's stack
> ==21024==  1504 bytes below stack pointer
> ==21024==
> 
> These days, the child_process structure has its own args array, and
> the standard way to set up its argv[] is to use that one, instead of
> assigning to process->argv to point at an array that is outside.
> Use that facility automatically fixes this issue.
> 
> Reported-by: Thomas Gummerer <t.gummerer@gmail.com>
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  sub-process.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/sub-process.c b/sub-process.c
> index fcc4832c14..648b3a3943 100644
> --- a/sub-process.c
> +++ b/sub-process.c
> @@ -74,13 +74,12 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
>  {
>  	int err;
>  	struct child_process *process;
> -	const char *argv[] = { cmd, NULL };
>  
>  	entry->cmd = cmd;
>  	process = &entry->process;
>  
>  	child_process_init(process);
> -	process->argv = argv;
> +	argv_array_push(&process->args, cmd);

Thanks!  *Much* nicer than what I had :)

>  	process->use_shell = 1;
>  	process->in = -1;
>  	process->out = -1;
> -- 
> 2.14.2-889-gd2948f6aa6
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] path.c: fix uninitialized memory access
  2017-10-04  4:47         ` Junio C Hamano
  2017-10-04  5:21           ` Jeff King
  2017-10-04 19:22           ` Thomas Gummerer
@ 2017-10-04 19:36           ` Jonathan Nieder
  2 siblings, 0 replies; 30+ messages in thread
From: Jonathan Nieder @ 2017-10-04 19:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Thomas Gummerer, git,
	Nguyễn Thái Ngọc Duy

Junio C Hamano wrote:

> From: Jeff King <peff@peff.net>
> Date: Tue, 3 Oct 2017 19:30:40 -0400
> Subject: [PATCH] path.c: fix uninitialized memory access
> 
> In cleanup_path we're passing in a char array, run a memcmp on it, and
> run through it without ever checking if something is in the array in the
> first place.  This can lead us to access uninitialized memory, for
> example in t5541-http-push-smart.sh test 7, when run under valgrind:
>
> ==4423== Conditional jump or move depends on uninitialised value(s)
> ==4423==    at 0x242FA9: cleanup_path (path.c:35)
[...]
> ==4423==  Uninitialised value was created by a heap allocation
[...]
> ==4423==    by 0x29A30B: strbuf_grow (strbuf.c:66)
> ==4423==    by 0x29A30B: strbuf_vaddf (strbuf.c:277)
> ==4423==    by 0x242F9F: mkpath (path.c:454)
[...]
> Avoid this by using skip_prefix(), which knows not to go beyond the
> end of the string.
>
> Reported-by: Thomas Gummerer <t.gummerer@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

This is indeed
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* lstat-ing delayed-filter output, was Re: playing with MSan
  2017-10-04 19:30     ` Thomas Gummerer
@ 2017-10-05  3:46       ` Jeff King
  2017-10-05 10:47         ` Lars Schneider
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2017-10-05  3:46 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Jonathan Nieder, git, Lars Schneider

On Wed, Oct 04, 2017 at 08:30:05PM +0100, Thomas Gummerer wrote:

> > So I dunno. This approach is a _lot_ more convenient than trying to
> > rebuild all the dependencies from scratch, and it runs way faster than
> > valgrind.  It did find the cases that led to the patches in this
> > series, and at least one more: if the lstat() at the end of
> > entry.c:write_entry() fails, we write nonsense into the cache_entry.
> 
> Yeah valgrind found that one too, as I tried (and apparently failed :))
> to explain in the cover letter.  I just haven't found the time yet to
> actually try and go fix that one.

No, I just have poor memory. :)

The obvious fix is that we should check the return value of `lstat`, but
the bigger question is why and when that would fail.

The case triggered by t0021 is using the new "delayed" filter mechanism.
So at the time that write_entry() finishes, we don't actually have the
file in the filesystem. I think we need to recognize that we got delayed
and didn't actually check anything out, and skip that whole "if
(state->refresh_cache)" block. It's not clear to me, though, how we tell
the difference between the delayed and normal cases in that function.

But I think this lstat could also fail if we are checking out and
somebody else racily deletes our file. This is presumably sufficiently
rare that I actually wonder if we should just bail with an error, so
that the user knows something funny is going on.

+cc Lars for thoughts no the delayed-filter case.

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: lstat-ing delayed-filter output, was Re: playing with MSan
  2017-10-05  3:46       ` lstat-ing delayed-filter output, was Re: playing with MSan Jeff King
@ 2017-10-05 10:47         ` Lars Schneider
  0 siblings, 0 replies; 30+ messages in thread
From: Lars Schneider @ 2017-10-05 10:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Gummerer, Jonathan Nieder, git


> On 05 Oct 2017, at 05:46, Jeff King <peff@peff.net> wrote:
> 
> On Wed, Oct 04, 2017 at 08:30:05PM +0100, Thomas Gummerer wrote:
> 
>>> So I dunno. This approach is a _lot_ more convenient than trying to
>>> rebuild all the dependencies from scratch, and it runs way faster than
>>> valgrind.  It did find the cases that led to the patches in this
>>> series, and at least one more: if the lstat() at the end of
>>> entry.c:write_entry() fails, we write nonsense into the cache_entry.
>> 
>> Yeah valgrind found that one too, as I tried (and apparently failed :))
>> to explain in the cover letter.  I just haven't found the time yet to
>> actually try and go fix that one.
> 
> No, I just have poor memory. :)
> 
> The obvious fix is that we should check the return value of `lstat`, but
> the bigger question is why and when that would fail.
> 
> The case triggered by t0021 is using the new "delayed" filter mechanism.
> So at the time that write_entry() finishes, we don't actually have the
> file in the filesystem. I think we need to recognize that we got delayed
> and didn't actually check anything out, and skip that whole "if
> (state->refresh_cache)" block. It's not clear to me, though, how we tell
> the difference between the delayed and normal cases in that function.

Oh. Great catch!


> But I think this lstat could also fail if we are checking out and
> somebody else racily deletes our file. This is presumably sufficiently
> rare that I actually wonder if we should just bail with an error, so
> that the user knows something funny is going on.

That sounds good to me!


I tried to address both issues here:
https://public-inbox.org/git/20171005104407.65948-1-lars.schneider@autodesk.com/

Cheers,
Lars

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-10-05 10:47 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-03 19:57 [PATCH 0/3] fixes for running the test suite with --valgrind Thomas Gummerer
2017-10-03 19:57 ` [PATCH 1/3] path.c: fix uninitialized memory access Thomas Gummerer
2017-10-03 22:45   ` Jonathan Nieder
2017-10-03 23:30     ` Jeff King
2017-10-03 23:37       ` Jonathan Nieder
2017-10-04  4:47         ` Junio C Hamano
2017-10-04  5:21           ` Jeff King
2017-10-04 19:22           ` Thomas Gummerer
2017-10-04 19:36           ` Jonathan Nieder
2017-10-03 19:57 ` [PATCH 2/3] http-push: fix construction of hex value from path Thomas Gummerer
2017-10-03 22:53   ` Jonathan Nieder
2017-10-03 23:36     ` Jeff King
2017-10-04  4:48       ` Junio C Hamano
2017-10-04  5:20         ` Junio C Hamano
2017-10-04  5:26           ` Jeff King
2017-10-04  6:26             ` Junio C Hamano
2017-10-03 19:57 ` [PATCH 3/3] sub-process: allocate argv on the heap Thomas Gummerer
2017-10-03 20:24   ` Johannes Sixt
2017-10-04  4:59     ` Junio C Hamano
2017-10-04  5:32       ` Jeff King
2017-10-04  5:58       ` Johannes Sixt
2017-10-04 19:31       ` Thomas Gummerer
2017-10-03 20:25   ` Stefan Beller
2017-10-03 23:41 ` [PATCH 0/3] fixes for running the test suite with --valgrind Jeff King
2017-10-03 23:50   ` Jonathan Nieder
2017-10-03 23:54     ` Jeff King
2017-10-04 10:19   ` playing with MSan, was " Jeff King
2017-10-04 19:30     ` Thomas Gummerer
2017-10-05  3:46       ` lstat-ing delayed-filter output, was Re: playing with MSan Jeff King
2017-10-05 10:47         ` Lars Schneider

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).