[PATCH 0/1] Fix an off-by-one bug in the untracked cache code

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/1] Fix an off-by-one bug in the untracked cache code
@ 2019-04-10 12:56 Johannes Schindelin via GitGitGadget
  2019-04-10 12:56 ` [PATCH 1/1] untracked cache: fix off-by-one Johannes Schindelin via GitGitGadget
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Frankly, I am surprised it took me this long to discover this bug, as I am
running with the untracked cache for ages now.

But as of recent, I am running more and more long-running commands, mostly
rebases to keep what I call "ever-green" branches up to date, where Git for
Windows' patch thicket is rebased onto core Git's four integration branches,
and when there are no merge conflicts to stop the rebase early, I got this
really obscure crash (no error message, no nothing to indicate what it was).

It cost me two full work days to chase this one down.

Johannes Schindelin (1):
  untracked cache: fix off-by-one

 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

base-commit: e35b8cb8e212e3557efc565157ceb5cbaaf0d87f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-178%2Fdscho%2Funtracked-cache-off-by-one-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-178/dscho/untracked-cache-off-by-one-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/178
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/1] untracked cache: fix off-by-one
  2019-04-10 12:56 [PATCH 0/1] Fix an off-by-one bug in the untracked cache code Johannes Schindelin via GitGitGadget
@ 2019-04-10 12:56 ` Johannes Schindelin via GitGitGadget
  2019-04-10 16:20   ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In f9e6c649589e (untracked cache: load from UNTR index extension,
2015-03-08), code was added to read back the untracked cache from an
index extension.

Probably in the endeavor to avoid the `calloc()` implied by
`FLEX_ALLOC_STR()` (it is hard to know why exactly, the commit message
of that commit is a bit parsimonious with information), it calls
`malloc()` manually and then `memcpy()`s the bits and pieces into place.

It allocates the size of `struct untracked_cache_dir` plus the string
length of the untracked file name, then copies the information in two
steps: first the fixed-size metadata, then the name. And here lies the
rub: it includes the trailing NUL byte in the name.

If `FLEX_ARRAY` is defined as 0, this results in a buffer overrun.

To fix this, let's just add 1, for the trailing NUL byte. Technically,
this overallocates on platforms where `FLEX_ARRAY` is 1, but it should
not matter much in reality, as `malloc()` usually overallocates anyway,
unless the size to allocate aligns exactly with some internal chunk size
(see below for more on that).

The real strange thing is that neither valgrind nor DrMemory catches
this bug. In this developer's tests, a `memcpy()` (but not a
`memset()`!) could write up to 4 bytes after the allocated memory range
before valgrind would start reporting an issue.

However, when running Git built with nedmalloc as allocator, under rare
conditions (and inconsistently at that), this bug triggered an `abort()`
because nedmalloc rounds up the size to be `malloc()`ed to a multiple of
a certain chunk size, then adds a few bytes to be used for storing some
internal state. If there is no rounding up to do (because the size is
already a multiple of that chunk size), and if the buffer is overrun as
in the code patched in this commit, the internal state is corrupted.

The scenario that triggered this here bug fix entailed a git.git
checkout with an extra copy of the source code in an untracked
subdirectory, meaning that there was an untracked subdirectory called
"thunderbird-patch-inline" whose name's length is exactly 24 bytes,
which, added to the size of above-mentioned `struct untracked_cache_dir`
that weighs in with 104 bytes on a 64-bit system, amounts to 128,
aligning perfectly with nedmalloc's chunk size.

As there is no obvious way to trigger this bug reliably, on all
platforms supported by Git, and as the bug is obvious enough, this patch
comes without a regression test.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index b2cabadf25..f5293a6536 100644
--- a/dir.c
+++ b/dir.c
@@ -2760,7 +2760,7 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
 	next = data + len + 1;
 	if (next > rd->end)
 		return -1;
-	*untracked_ = untracked = xmalloc(st_add(sizeof(*untracked), len));
+	*untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), len, 1));
 	memcpy(untracked, &ud, sizeof(ud));
 	memcpy(untracked->name, data, len + 1);
 	data = next;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] untracked cache: fix off-by-one
  2019-04-10 12:56 ` [PATCH 1/1] untracked cache: fix off-by-one Johannes Schindelin via GitGitGadget
@ 2019-04-10 16:20   ` Jeff King
  2019-04-12  1:48     ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2019-04-10 16:20 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Junio C Hamano, Johannes Schindelin

On Wed, Apr 10, 2019 at 05:56:48AM -0700, Johannes Schindelin via GitGitGadget wrote:

> Probably in the endeavor to avoid the `calloc()` implied by
> `FLEX_ALLOC_STR()` (it is hard to know why exactly, the commit message
> of that commit is a bit parsimonious with information), it calls
> `malloc()` manually and then `memcpy()`s the bits and pieces into place.

You have a talent for understatement. :)

> It allocates the size of `struct untracked_cache_dir` plus the string
> length of the untracked file name, then copies the information in two
> steps: first the fixed-size metadata, then the name. And here lies the
> rub: it includes the trailing NUL byte in the name.
> 
> If `FLEX_ARRAY` is defined as 0, this results in a buffer overrun.
> 
> To fix this, let's just add 1, for the trailing NUL byte. Technically,
> this overallocates on platforms where `FLEX_ARRAY` is 1, but it should
> not matter much in reality, as `malloc()` usually overallocates anyway,
> unless the size to allocate aligns exactly with some internal chunk size
> (see below for more on that).

Yeah, every struct on a platform where FLEX_ARRAY is 1 ends up
over-allocated by 1 byte. We could account for that in all of our flex
allocations, but I don't it affects enough platforms to be worth the
bother.

> The real strange thing is that neither valgrind nor DrMemory catches
> this bug. In this developer's tests, a `memcpy()` (but not a
> `memset()`!) could write up to 4 bytes after the allocated memory range
> before valgrind would start reporting an issue.

I couldn't get it to trigger with ASan, either.  I assume it's due to
alignment (i.e., those tools are stuck poisoning at the end of a page,
but the start of the struct needs to be page-aligned). But that would
imply that we could trigger it with different path lengths, which I
wasn't able to do. So I dunno.

> However, when running Git built with nedmalloc as allocator, under rare
> conditions (and inconsistently at that), this bug triggered an `abort()`
> because nedmalloc rounds up the size to be `malloc()`ed to a multiple of
> a certain chunk size, then adds a few bytes to be used for storing some
> internal state. If there is no rounding up to do (because the size is
> already a multiple of that chunk size), and if the buffer is overrun as
> in the code patched in this commit, the internal state is corrupted.
>
> The scenario that triggered this here bug fix entailed a git.git
> checkout with an extra copy of the source code in an untracked
> subdirectory, meaning that there was an untracked subdirectory called
> "thunderbird-patch-inline" whose name's length is exactly 24 bytes,
> which, added to the size of above-mentioned `struct untracked_cache_dir`
> that weighs in with 104 bytes on a 64-bit system, amounts to 128,
> aligning perfectly with nedmalloc's chunk size.

Right, that makes sense that the length would impact it there.

> As there is no obvious way to trigger this bug reliably, on all
> platforms supported by Git, and as the bug is obvious enough, this patch
> comes without a regression test.

Makes sense. This code path should be well-covered by the existing tests
anyway, so even if we could get those tools to trigger, I don't think
there would be much point in adding a new test.

> diff --git a/dir.c b/dir.c
> index b2cabadf25..f5293a6536 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2760,7 +2760,7 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
>  	next = data + len + 1;
>  	if (next > rd->end)
>  		return -1;
> -	*untracked_ = untracked = xmalloc(st_add(sizeof(*untracked), len));
> +	*untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), len, 1));
>  	memcpy(untracked, &ud, sizeof(ud));
>  	memcpy(untracked->name, data, len + 1);

This is obviously correct, even just from the context.

IMHO it is worth it in cases like this to just use FLEX_ALLOC for
simplicity; calloc() is not too expensive. However, there's an
interesting subtlety there. Our length comes from this line just above
your hunk:

  len = strlen((const char *)data);

how do we know that data actually contains a NUL? It's ultimately
pointing to our mmap of the index file. So I think a malformed index
would potentially cause us to go off the end of the array and read
arbitrary memory.

The right thing is probably something like:

  eos = memchr(data, '\0', end - data);
  if (!eos)
	return error("malformed untracked cache extension");
  len = eos - data;

I wouldn't be at all surprised if other bits of the index code have the
same issue, though. And at any rate, thinking about that should
definitely not hold up your fix.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] untracked cache: fix off-by-one
  2019-04-10 16:20   ` Jeff King
@ 2019-04-12  1:48     ` Junio C Hamano
  2019-04-18 21:14       ` [PATCH 0/3] untracked cache parsing fixups Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2019-04-12  1:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> Yeah, every struct on a platform where FLEX_ARRAY is 1 ends up
> over-allocated by 1 byte. We could account for that in all of our flex
> allocations, but I don't it affects enough platforms to be worth the
> bother.

It was an explicit design decision to declare that the
overallocation was a non-problem back when we invented FLEX_ARRAY
macro.  We could probably have instead decided to pass number of
bytes to be stored in the flex-array (including NUL if that is a
character array) then subtract the value of FLEX_ARRAY if FLEX_ARRAY
were limited to only 0 or 1, but that was not workable as it is the
most natural to define FLEX_ARRAY to an empty string, i.e. making an
array field decl to "type field[];", for the more recent compilers.

>> As there is no obvious way to trigger this bug reliably, on all
>> platforms supported by Git, and as the bug is obvious enough, this patch
>> comes without a regression test.
>
> Makes sense. This code path should be well-covered by the existing tests
> anyway, so even if we could get those tools to trigger, I don't think
> there would be much point in adding a new test.

Yeah, it is already super that this obscure bug was spotted in the
first place.  It is true that this may regress without a test, but I
do not think it is reasonable to expect that it is possible to write
a reliable crash-detecting test for this one.

> The right thing is probably something like:
>
>   eos = memchr(data, '\0', end - data);
>   if (!eos)
> 	return error("malformed untracked cache extension");
>   len = eos - data;
>
> I wouldn't be at all surprised if other bits of the index code have the
> same issue, though. And at any rate, thinking about that should
> definitely not hold up your fix.

True, true.  I wonder if folks intereseted in libFuzzer can chime in
with something useful here, but that is totally independent.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/3] untracked cache parsing fixups
  2019-04-12  1:48     ` Junio C Hamano
@ 2019-04-18 21:14       ` Jeff King
  2019-04-18 21:17         ` [PATCH 1/3] untracked-cache: be defensive about missing NULs in index Jeff King
                           ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Jeff King @ 2019-04-18 21:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin,
	Nguyễn Thái Ngọc Duy

[+cc Duy as the master of all things untracked-cache]

On Fri, Apr 12, 2019 at 10:48:30AM +0900, Junio C Hamano wrote:

> > The right thing is probably something like:
> >
> >   eos = memchr(data, '\0', end - data);
> >   if (!eos)
> > 	return error("malformed untracked cache extension");
> >   len = eos - data;
> >
> > I wouldn't be at all surprised if other bits of the index code have the
> > same issue, though. And at any rate, thinking about that should
> > definitely not hold up your fix.
> 
> True, true.  I wonder if folks intereseted in libFuzzer can chime in
> with something useful here, but that is totally independent.

Just so we don't forget about it, I wrote this fix up as a patch. And in
fact it led to a few other cleanups. I think the first one is definitely
worth doing now, even if there are other similar cases lurking in the
rest of the index code.

The other two are optional, though I think they are worth it (and not
too hard to verify that they are doing the right thing).

These are on top of js/untracked-cache-allocfix (though they could
easily be ported to a separate topic if we want).

  [1/3]: untracked-cache: be defensive about missing NULs in index
  [2/3]: untracked-cache: simplify parsing by dropping "next"
  [3/3]: untracked-cache: simplify parsing by dropping "len"

 dir.c | 41 ++++++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 23 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] untracked-cache: be defensive about missing NULs in index
  2019-04-18 21:14       ` [PATCH 0/3] untracked cache parsing fixups Jeff King
@ 2019-04-18 21:17         ` Jeff King
  2019-04-19  5:29           ` Junio C Hamano
  2019-04-18 21:17         ` [PATCH 2/3] untracked-cache: simplify parsing by dropping "next" Jeff King
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2019-04-18 21:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin,
	Nguyễn Thái Ngọc Duy

The on-disk format for the untracked-cache extension contains
NUL-terminated filenames. We parse these from the mmap'd file using
string functions like strlen(). This works fine in the normal case, but
if we see a malformed or corrupted index, we might read off the end of
our mmap.

Instead, let's use memchr() to find the trailing NUL within the bytes we
know are available, and return an error if it's missing.

Note that we can further simplify by folding another range check into
our conditional. After we find the end of the string, we set "next" to
the byte after the string and treat it as an error if there are no such
bytes left. That saves us from having to do a range check at the
beginning of each subsequent string (and works because there is always
data after each string). We can do both range checks together by
checking "!eos" (we didn't find a NUL) and "eos == end" (it was on the
last available byte, meaning there's nothing after). This replaces the
existing "next > end" checks.

Note also that the decode_varint() calls have a similar problem (we
don't even pass them "end"; they just keep parsing). These are probably
OK in practice since varints have a finite length (we stop parsing when
we'd overflow a uintmax_t), so the worst case is that we'd overflow into
reading the trailing bytes of the index.

Signed-off-by: Jeff King <peff@peff.net>
---
 dir.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index f5293a6536..7b0513c476 100644
--- a/dir.c
+++ b/dir.c
@@ -2733,6 +2733,7 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
 {
 	struct untracked_cache_dir ud, *untracked;
 	const unsigned char *next, *data = rd->data, *end = rd->end;
+	const unsigned char *eos;
 	unsigned int value;
 	int i, len;

@@ -2756,21 +2757,24 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
 	ALLOC_ARRAY(ud.dirs, ud.dirs_nr);
 	data = next;

-	len = strlen((const char *)data);
-	next = data + len + 1;
-	if (next > rd->end)
+	eos = memchr(data, '\0', end - data);
+	if (!eos || eos == end)
 		return -1;
+	len = eos - data;
+	next = eos + 1;
+
 	*untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), len, 1));
 	memcpy(untracked, &ud, sizeof(ud));
 	memcpy(untracked->name, data, len + 1);
 	data = next;

 	for (i = 0; i < untracked->untracked_nr; i++) {
-		len = strlen((const char *)data);
-		next = data + len + 1;
-		if (next > rd->end)
+		eos = memchr(data, '\0', end - data);
+		if (!eos || eos == end)
 			return -1;
-		untracked->untracked[i] = xstrdup((const char*)data);
+		len = eos - data;
+		next = eos + 1;
+		untracked->untracked[i] = xmemdupz(data, len);
 		data = next;
 	}

-- 
2.21.0.1092.g8b0302e9c4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] untracked-cache: be defensive about missing NULs in index
  2019-04-18 21:17         ` [PATCH 1/3] untracked-cache: be defensive about missing NULs in index Jeff King
@ 2019-04-19  5:29           ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2019-04-19  5:29 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin,
	Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

> -	len = strlen((const char *)data);
> -	next = data + len + 1;
> -	if (next > rd->end)
> +	eos = memchr(data, '\0', end - data);
> +	if (!eos || eos == end)
>  		return -1;
> +	len = eos - data;
> +	next = eos + 1;

Yup, much nicer.

> -		len = strlen((const char *)data);
> -		next = data + len + 1;
> -		if (next > rd->end)
> +		eos = memchr(data, '\0', end - data);
> +		if (!eos || eos == end)
>  			return -1;
> -		untracked->untracked[i] = xstrdup((const char*)data);
> +		len = eos - data;
> +		next = eos + 1;
> +		untracked->untracked[i] = xmemdupz(data, len);

Same here, too.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/3] untracked-cache: simplify parsing by dropping "next"
  2019-04-18 21:14       ` [PATCH 0/3] untracked cache parsing fixups Jeff King
  2019-04-18 21:17         ` [PATCH 1/3] untracked-cache: be defensive about missing NULs in index Jeff King
@ 2019-04-18 21:17         ` Jeff King
  2019-04-19  5:33           ` Junio C Hamano
  2019-04-18 21:18         ` [PATCH 3/3] untracked-cache: simplify parsing by dropping "len" Jeff King
  2019-04-18 21:24         ` [PATCH 4/3] untracked-cache: use FLEX_ALLOC to create internal structs Jeff King
  3 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2019-04-18 21:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin,
	Nguyễn Thái Ngọc Duy

When we parse an on-disk untracked cache, we have two pointers, "data"
and "next". As we parse, we point "next" to the end of an element, and
then later update "data" to match.

But we actually don't need two pointers. Each parsing step can just
update "data" directly from other variables we hold (and we don't have
to worry about bailing in an intermediate state, since any parsing
failure causes us to immediately discard "data" and return).

Signed-off-by: Jeff King <peff@peff.net>
---
 dir.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/dir.c b/dir.c
index 7b0513c476..17865f44df 100644
--- a/dir.c
+++ b/dir.c
@@ -2732,50 +2732,44 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
 			struct read_data *rd)
 {
 	struct untracked_cache_dir ud, *untracked;
-	const unsigned char *next, *data = rd->data, *end = rd->end;
+	const unsigned char *data = rd->data, *end = rd->end;
 	const unsigned char *eos;
 	unsigned int value;
 	int i, len;
 
 	memset(&ud, 0, sizeof(ud));
 
-	next = data;
-	value = decode_varint(&next);
-	if (next > end)
+	value = decode_varint(&data);
+	if (data > end)
 		return -1;
 	ud.recurse	   = 1;
 	ud.untracked_alloc = value;
 	ud.untracked_nr	   = value;
 	if (ud.untracked_nr)
 		ALLOC_ARRAY(ud.untracked, ud.untracked_nr);
-	data = next;
 
-	next = data;
-	ud.dirs_alloc = ud.dirs_nr = decode_varint(&next);
-	if (next > end)
+	ud.dirs_alloc = ud.dirs_nr = decode_varint(&data);
+	if (data > end)
 		return -1;
 	ALLOC_ARRAY(ud.dirs, ud.dirs_nr);
-	data = next;
 
 	eos = memchr(data, '\0', end - data);
 	if (!eos || eos == end)
 		return -1;
 	len = eos - data;
-	next = eos + 1;
 
 	*untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), len, 1));
 	memcpy(untracked, &ud, sizeof(ud));
 	memcpy(untracked->name, data, len + 1);
-	data = next;
+	data = eos + 1;
 
 	for (i = 0; i < untracked->untracked_nr; i++) {
 		eos = memchr(data, '\0', end - data);
 		if (!eos || eos == end)
 			return -1;
 		len = eos - data;
-		next = eos + 1;
 		untracked->untracked[i] = xmemdupz(data, len);
-		data = next;
+		data = eos + 1;
 	}
 
 	rd->ucd[rd->index++] = untracked;
-- 
2.21.0.1092.g8b0302e9c4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] untracked-cache: simplify parsing by dropping "next"
  2019-04-18 21:17         ` [PATCH 2/3] untracked-cache: simplify parsing by dropping "next" Jeff King
@ 2019-04-19  5:33           ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2019-04-19  5:33 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin,
	Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

> When we parse an on-disk untracked cache, we have two pointers, "data"
> and "next". As we parse, we point "next" to the end of an element, and
> then later update "data" to match.
>
> But we actually don't need two pointers. Each parsing step can just
> update "data" directly from other variables we hold (and we don't have
> to worry about bailing in an intermediate state, since any parsing
> failure causes us to immediately discard "data" and return).

;-)  

My first reaction was "you can do so now you have introduced
eos--why didn't you do that in the previous step?", but losing
'next' from the varint parsing step would certainly have been
possible even before that change.  So I agree that it makes much
more sense to do this step separately from the previous one.

The code after the patch certainly reads easier and simpler.

Thanks.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  dir.c | 20 +++++++-------------
>  1 file changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 7b0513c476..17865f44df 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2732,50 +2732,44 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
>  			struct read_data *rd)
>  {
>  	struct untracked_cache_dir ud, *untracked;
> -	const unsigned char *next, *data = rd->data, *end = rd->end;
> +	const unsigned char *data = rd->data, *end = rd->end;
>  	const unsigned char *eos;
>  	unsigned int value;
>  	int i, len;
>  
>  	memset(&ud, 0, sizeof(ud));
>  
> -	next = data;
> -	value = decode_varint(&next);
> -	if (next > end)
> +	value = decode_varint(&data);
> +	if (data > end)
>  		return -1;
>  	ud.recurse	   = 1;
>  	ud.untracked_alloc = value;
>  	ud.untracked_nr	   = value;
>  	if (ud.untracked_nr)
>  		ALLOC_ARRAY(ud.untracked, ud.untracked_nr);
> -	data = next;
>  
> -	next = data;
> -	ud.dirs_alloc = ud.dirs_nr = decode_varint(&next);
> -	if (next > end)
> +	ud.dirs_alloc = ud.dirs_nr = decode_varint(&data);
> +	if (data > end)
>  		return -1;
>  	ALLOC_ARRAY(ud.dirs, ud.dirs_nr);
> -	data = next;
>  
>  	eos = memchr(data, '\0', end - data);
>  	if (!eos || eos == end)
>  		return -1;
>  	len = eos - data;
> -	next = eos + 1;
>  
>  	*untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), len, 1));
>  	memcpy(untracked, &ud, sizeof(ud));
>  	memcpy(untracked->name, data, len + 1);
> -	data = next;
> +	data = eos + 1;
>  
>  	for (i = 0; i < untracked->untracked_nr; i++) {
>  		eos = memchr(data, '\0', end - data);
>  		if (!eos || eos == end)
>  			return -1;
>  		len = eos - data;
> -		next = eos + 1;
>  		untracked->untracked[i] = xmemdupz(data, len);
> -		data = next;
> +		data = eos + 1;
>  	}
>  
>  	rd->ucd[rd->index++] = untracked;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/3] untracked-cache: simplify parsing by dropping "len"
  2019-04-18 21:14       ` [PATCH 0/3] untracked cache parsing fixups Jeff King
  2019-04-18 21:17         ` [PATCH 1/3] untracked-cache: be defensive about missing NULs in index Jeff King
  2019-04-18 21:17         ` [PATCH 2/3] untracked-cache: simplify parsing by dropping "next" Jeff King
@ 2019-04-18 21:18         ` Jeff King
  2019-04-18 21:24         ` [PATCH 4/3] untracked-cache: use FLEX_ALLOC to create internal structs Jeff King
  3 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2019-04-18 21:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin,
	Nguyễn Thái Ngọc Duy

The code which parses untracked-cache extensions from disk keeps a "len"
variable, which is the size of the string we are parsing. But since we
now have an "end of string" variable, we can just use that to get the
length when we need it. This eliminates the need to keep "len" up to
date (and removes the possibility of any errors where "len" and "eos"
get out of sync).

As a bonus, it means we are not storing a string length in an "int",
which is a potential source of overflows (though in this case it seems
fairly unlikely for that to cause any memory problems).

Signed-off-by: Jeff King <peff@peff.net>
---
 dir.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index 17865f44df..60438b2cdc 100644
--- a/dir.c
+++ b/dir.c
@@ -2735,7 +2735,7 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
 	const unsigned char *data = rd->data, *end = rd->end;
 	const unsigned char *eos;
 	unsigned int value;
-	int i, len;
+	int i;
 
 	memset(&ud, 0, sizeof(ud));
 
@@ -2756,28 +2756,25 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
 	eos = memchr(data, '\0', end - data);
 	if (!eos || eos == end)
 		return -1;
-	len = eos - data;
 
-	*untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), len, 1));
+	*untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), eos - data, 1));
 	memcpy(untracked, &ud, sizeof(ud));
-	memcpy(untracked->name, data, len + 1);
+	memcpy(untracked->name, data, eos - data + 1);
 	data = eos + 1;
 
 	for (i = 0; i < untracked->untracked_nr; i++) {
 		eos = memchr(data, '\0', end - data);
 		if (!eos || eos == end)
 			return -1;
-		len = eos - data;
-		untracked->untracked[i] = xmemdupz(data, len);
+		untracked->untracked[i] = xmemdupz(data, eos - data);
 		data = eos + 1;
 	}
 
 	rd->ucd[rd->index++] = untracked;
 	rd->data = data;
 
 	for (i = 0; i < untracked->dirs_nr; i++) {
-		len = read_one_dir(untracked->dirs + i, rd);
-		if (len < 0)
+		if (read_one_dir(untracked->dirs + i, rd) < 0)
 			return -1;
 	}
 	return 0;
-- 
2.21.0.1092.g8b0302e9c4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/3] untracked-cache: use FLEX_ALLOC to create internal structs
  2019-04-18 21:14       ` [PATCH 0/3] untracked cache parsing fixups Jeff King
                           ` (2 preceding siblings ...)
  2019-04-18 21:18         ` [PATCH 3/3] untracked-cache: simplify parsing by dropping "len" Jeff King
@ 2019-04-18 21:24         ` Jeff King
  2019-04-19  9:18           ` Duy Nguyen
  3 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2019-04-18 21:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin,
	Nguyễn Thái Ngọc Duy

On Thu, Apr 18, 2019 at 05:14:08PM -0400, Jeff King wrote:

> Just so we don't forget about it, I wrote this fix up as a patch. And in
> fact it led to a few other cleanups. I think the first one is definitely
> worth doing now, even if there are other similar cases lurking in the
> rest of the index code.
> 
> The other two are optional, though I think they are worth it (and not
> too hard to verify that they are doing the right thing).
> 
> These are on top of js/untracked-cache-allocfix (though they could
> easily be ported to a separate topic if we want).
> 
>   [1/3]: untracked-cache: be defensive about missing NULs in index
>   [2/3]: untracked-cache: simplify parsing by dropping "next"
>   [3/3]: untracked-cache: simplify parsing by dropping "len"

I also wondered if we could just accept the cost of calloc() here and
use FLEX_ALLOC to simplify things. That resulted in the patch below, but
I didn't include it with the initial 3, because I think it's too
subtle/gross for my tastes.

-- >8 --
Subject: untracked-cache: use FLEX_ALLOC to create internal structs

The untracked_cache_dir struct has a FLEX_ARRAY in it. Let's use
FLEX_ALLOC_MEM to allocate it, which saves us having to compute the
length ourselves.

In theory this could be slightly slower, since the FLEX_ALLOC macros use
calloc (and we just memcpy over most of the contents anyway). But in
practice this distinction is not generally measurable.

Note that because we then fill in the pre-flex elements of the struct
using a memcpy, we need to take care to use the exact size of that
space and _not_ "sizeof(ud)", since the latter may include padding (or
even an extra byte on systems where FLEX_ARRAY is 1).

Signed-off-by: Jeff King <peff@peff.net>
---
If we wanted to go this route, I think it would make sense to provide a
FLEX_ALLOC macro that takes a "template" set of bytes as a ptr/len pair,
and writes it before we fill in the flex portion.

Then we could do something like:

  FLEX_ALLOC_COPY(untracked, &ud, sizeof(ud), name, data, eos - data);

If this is the only such case, it's probably not worth it (I didn't
really look around for more, though).

 dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 60438b2cdc..7cd4eec198 100644
--- a/dir.c
+++ b/dir.c
@@ -2757,9 +2757,9 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
 	if (!eos || eos == end)
 		return -1;
 
-	*untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), eos - data, 1));
-	memcpy(untracked, &ud, sizeof(ud));
-	memcpy(untracked->name, data, eos - data + 1);
+	FLEX_ALLOC_MEM(untracked, name, data, eos - data);
+	memcpy(untracked, &ud, offsetof(struct untracked_cache_dir, name));
+	*untracked_ = untracked;
 	data = eos + 1;
 
 	for (i = 0; i < untracked->untracked_nr; i++) {
-- 
2.21.0.1092.g8b0302e9c4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/3] untracked-cache: use FLEX_ALLOC to create internal structs
  2019-04-18 21:24         ` [PATCH 4/3] untracked-cache: use FLEX_ALLOC to create internal structs Jeff King
@ 2019-04-19  9:18           ` Duy Nguyen
  2019-04-19 19:43             ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Duy Nguyen @ 2019-04-19  9:18 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget,
	Git Mailing List, Johannes Schindelin

On Fri, Apr 19, 2019 at 4:24 AM Jeff King <peff@peff.net> wrote:
>
> On Thu, Apr 18, 2019 at 05:14:08PM -0400, Jeff King wrote:
>
> > Just so we don't forget about it, I wrote this fix up as a patch. And in
> > fact it led to a few other cleanups. I think the first one is definitely
> > worth doing now, even if there are other similar cases lurking in the
> > rest of the index code.
> >
> > The other two are optional, though I think they are worth it (and not
> > too hard to verify that they are doing the right thing).
> >
> > These are on top of js/untracked-cache-allocfix (though they could
> > easily be ported to a separate topic if we want).
> >
> >   [1/3]: untracked-cache: be defensive about missing NULs in index
> >   [2/3]: untracked-cache: simplify parsing by dropping "next"
> >   [3/3]: untracked-cache: simplify parsing by dropping "len"
>
> I also wondered if we could just accept the cost of calloc() here and
> use FLEX_ALLOC to simplify things. That resulted in the patch below, but
> I didn't include it with the initial 3, because I think it's too
> subtle/gross for my tastes.

It's probably ok. If I remember correctly, reading UNTR extension (on
a huge repo) took the longest time after Ben added support for reading
the index with multiple threads. So performance is a concern, but I
don't think calloc() would be the problem. malloc() itself without the
memory pool could probably slow down more when we have lots and lots
of directories.

> -- >8 --
> Subject: untracked-cache: use FLEX_ALLOC to create internal structs
>
> The untracked_cache_dir struct has a FLEX_ARRAY in it. Let's use
> FLEX_ALLOC_MEM to allocate it, which saves us having to compute the
> length ourselves.
>
> In theory this could be slightly slower, since the FLEX_ALLOC macros use
> calloc (and we just memcpy over most of the contents anyway). But in
> practice this distinction is not generally measurable.
>
> Note that because we then fill in the pre-flex elements of the struct
> using a memcpy, we need to take care to use the exact size of that
> space and _not_ "sizeof(ud)", since the latter may include padding (or
> even an extra byte on systems where FLEX_ARRAY is 1).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> If we wanted to go this route, I think it would make sense to provide a
> FLEX_ALLOC macro that takes a "template" set of bytes as a ptr/len pair,
> and writes it before we fill in the flex portion.
>
> Then we could do something like:
>
>   FLEX_ALLOC_COPY(untracked, &ud, sizeof(ud), name, data, eos - data);
>
> If this is the only such case, it's probably not worth it (I didn't
> really look around for more, though).
>
>  dir.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 60438b2cdc..7cd4eec198 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2757,9 +2757,9 @@ static int read_one_dir(struct untracked_cache_dir **untracked_,
>         if (!eos || eos == end)
>                 return -1;
>
> -       *untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), eos - data, 1));
> -       memcpy(untracked, &ud, sizeof(ud));
> -       memcpy(untracked->name, data, eos - data + 1);
> +       FLEX_ALLOC_MEM(untracked, name, data, eos - data);
> +       memcpy(untracked, &ud, offsetof(struct untracked_cache_dir, name));
> +       *untracked_ = untracked;
>         data = eos + 1;
>
>         for (i = 0; i < untracked->untracked_nr; i++) {
> --
> 2.21.0.1092.g8b0302e9c4
>


-- 
Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/3] untracked-cache: use FLEX_ALLOC to create internal structs
  2019-04-19  9:18           ` Duy Nguyen
@ 2019-04-19 19:43             ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2019-04-19 19:43 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Junio C Hamano, Johannes Schindelin via GitGitGadget,
	Git Mailing List, Johannes Schindelin

On Fri, Apr 19, 2019 at 04:18:25PM +0700, Duy Nguyen wrote:

> > I also wondered if we could just accept the cost of calloc() here and
> > use FLEX_ALLOC to simplify things. That resulted in the patch below, but
> > I didn't include it with the initial 3, because I think it's too
> > subtle/gross for my tastes.
> 
> It's probably ok. If I remember correctly, reading UNTR extension (on
> a huge repo) took the longest time after Ben added support for reading
> the index with multiple threads. So performance is a concern, but I
> don't think calloc() would be the problem. malloc() itself without the
> memory pool could probably slow down more when we have lots and lots
> of directories.

I think if we do the FLEX_ALLOC_COPY() thing I mentioned that it would
probably _not_ use calloc() there, since we'd know we were copying in
the content from elsewhere. So that concern would go away either way. :)

(But I'm still skeptical that FLEX_ALLOC_COPY() is worth it unless we
can find at least one other caller).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-04-19 20:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-10 12:56 [PATCH 0/1] Fix an off-by-one bug in the untracked cache code Johannes Schindelin via GitGitGadget
2019-04-10 12:56 ` [PATCH 1/1] untracked cache: fix off-by-one Johannes Schindelin via GitGitGadget
2019-04-10 16:20   ` Jeff King
2019-04-12  1:48     ` Junio C Hamano
2019-04-18 21:14       ` [PATCH 0/3] untracked cache parsing fixups Jeff King
2019-04-18 21:17         ` [PATCH 1/3] untracked-cache: be defensive about missing NULs in index Jeff King
2019-04-19  5:29           ` Junio C Hamano
2019-04-18 21:17         ` [PATCH 2/3] untracked-cache: simplify parsing by dropping "next" Jeff King
2019-04-19  5:33           ` Junio C Hamano
2019-04-18 21:18         ` [PATCH 3/3] untracked-cache: simplify parsing by dropping "len" Jeff King
2019-04-18 21:24         ` [PATCH 4/3] untracked-cache: use FLEX_ALLOC to create internal structs Jeff King
2019-04-19  9:18           ` Duy Nguyen
2019-04-19 19:43             ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).