git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] cleanup: fix possible overflow errors in binary search, part 2
@ 2019-06-13 17:51 René Scharfe
  2019-06-13 18:06 ` Derrick Stolee
  2019-06-13 19:42 ` Martin Ågren
  0 siblings, 2 replies; 5+ messages in thread
From: René Scharfe @ 2019-06-13 17:51 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Derrick Stolee, Jeff Hostetler, Junio C Hamano

Calculating the sum of two array indexes to find the midpoint between
them can overflow, i.e. code like this is unsafe for big arrays:

	mid = (first + last) >> 1;

Make sure the intermediate value stays within the boundaries instead,
like this:

	mid = first + ((last - first) >> 1);

The loop condition of the binary search makes sure that 'last' is
always greater than 'first', so this is safe as long as 'first' is
not negative.  And that can be verified easily using the pre-context
of each change, except for name-hash.c, so add an assertion to that
effect there.

The unsafe calculations were found with:

	git grep '(.*+.*) *>> *1'

This is a continuation of 19716b21a4 (cleanup: fix possible overflow
errors in binary search, 2017-10-08).

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 builtin/ls-files.c  | 2 +-
 diffcore-rename.c   | 4 ++--
 dir.c               | 2 +-
 name-hash.c         | 3 ++-
 read-cache.c        | 2 +-
 sh-i18n--envsubst.c | 2 +-
 6 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 7f83c9a6f2..670e8fb93c 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -373,7 +373,7 @@ static void prune_index(struct index_state *istate,
 	first = pos;
 	last = istate->cache_nr;
 	while (last > first) {
-		int next = (last + first) >> 1;
+		int next = first + ((last - first) >> 1);
 		const struct cache_entry *ce = istate->cache[next];
 		if (!strncmp(ce->name, prefix, prefixlen)) {
 			first = next+1;
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 07bd34b631..6af92d5eba 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -23,7 +23,7 @@ static int find_rename_dst(struct diff_filespec *two)
 	first = 0;
 	last = rename_dst_nr;
 	while (last > first) {
-		int next = (last + first) >> 1;
+		int next = first + ((last - first) >> 1);
 		struct diff_rename_dst *dst = &(rename_dst[next]);
 		int cmp = strcmp(two->path, dst->two->path);
 		if (!cmp)
@@ -83,7 +83,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
 	first = 0;
 	last = rename_src_nr;
 	while (last > first) {
-		int next = (last + first) >> 1;
+		int next = first + ((last - first) >> 1);
 		struct diff_rename_src *src = &(rename_src[next]);
 		int cmp = strcmp(one->path, src->p->one->path);
 		if (!cmp)
diff --git a/dir.c b/dir.c
index ba4a51c296..d021c908e5 100644
--- a/dir.c
+++ b/dir.c
@@ -701,7 +701,7 @@ static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
 	first = 0;
 	last = dir->dirs_nr;
 	while (last > first) {
-		int cmp, next = (last + first) >> 1;
+		int cmp, next = first + ((last - first) >> 1);
 		d = dir->dirs[next];
 		cmp = strncmp(name, d->name, len);
 		if (!cmp && strlen(d->name) > len)
diff --git a/name-hash.c b/name-hash.c
index b4861bc7b0..695908609f 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -345,8 +345,9 @@ static int handle_range_dir(
 	else {
 		int begin = k_start;
 		int end = k_end;
+		assert(begin >= 0);
 		while (begin < end) {
-			int mid = (begin + end) >> 1;
+			int mid = begin + ((end - begin) >> 1);
 			int cmp = strncmp(istate->cache[mid]->name, prefix->buf, prefix->len);
 			if (cmp == 0) /* mid has same prefix; look in second part */
 				begin = mid + 1;
diff --git a/read-cache.c b/read-cache.c
index 22e7b9944e..4f81fca320 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -549,7 +549,7 @@ static int index_name_stage_pos(const struct index_state *istate, const char *na
 	first = 0;
 	last = istate->cache_nr;
 	while (last > first) {
-		int next = (last + first) >> 1;
+		int next = first + ((last - first) >> 1);
 		struct cache_entry *ce = istate->cache[next];
 		int cmp = cache_name_stage_compare(name, namelen, stage, ce->name, ce_namelen(ce), ce_stage(ce));
 		if (!cmp)
diff --git a/sh-i18n--envsubst.c b/sh-i18n--envsubst.c
index cecfdd36c7..e7430b9aa8 100644
--- a/sh-i18n--envsubst.c
+++ b/sh-i18n--envsubst.c
@@ -249,7 +249,7 @@ sorted_string_list_member (const string_list_ty *slp, const char *s)
 	{
 	  /* Here we know that if s is in the list, it is at an index j
 	     with j1 <= j < j2.  */
-	  size_t j = (j1 + j2) >> 1;
+	  size_t j = j1 + ((j2 - j1) >> 1);
 	  int result = strcmp (slp->item[j], s);

 	  if (result > 0)
--
2.22.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2
  2019-06-13 17:51 [PATCH] cleanup: fix possible overflow errors in binary search, part 2 René Scharfe
@ 2019-06-13 18:06 ` Derrick Stolee
  2019-06-13 19:42 ` Martin Ågren
  1 sibling, 0 replies; 5+ messages in thread
From: Derrick Stolee @ 2019-06-13 18:06 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List
  Cc: Derrick Stolee, Jeff Hostetler, Junio C Hamano

On 6/13/2019 1:51 PM, René Scharfe wrote:
> Calculating the sum of two array indexes to find the midpoint between
> them can overflow, i.e. code like this is unsafe for big arrays:
> 
> 	mid = (first + last) >> 1;
> 
> Make sure the intermediate value stays within the boundaries instead,
> like this:
> 
> 	mid = first + ((last - first) >> 1);
> 
> The loop condition of the binary search makes sure that 'last' is
> always greater than 'first', so this is safe as long as 'first' is
> not negative.  And that can be verified easily using the pre-context
> of each change, except for name-hash.c, so add an assertion to that
> effect there.
> 
> The unsafe calculations were found with:
> 
> 	git grep '(.*+.*) *>> *1'
> 
> This is a continuation of 19716b21a4 (cleanup: fix possible overflow
> errors in binary search, 2017-10-08).

Thank you for finding these additional examples!

LGTM. Pretty easy to see this is correct.

-Stolee


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2
  2019-06-13 17:51 [PATCH] cleanup: fix possible overflow errors in binary search, part 2 René Scharfe
  2019-06-13 18:06 ` Derrick Stolee
@ 2019-06-13 19:42 ` Martin Ågren
  2019-06-13 21:33   ` René Scharfe
  1 sibling, 1 reply; 5+ messages in thread
From: Martin Ågren @ 2019-06-13 19:42 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git Mailing List, Derrick Stolee, Jeff Hostetler, Junio C Hamano

On Thu, 13 Jun 2019 at 19:54, René Scharfe <l.s.r@web.de> wrote:
>
> Calculating the sum of two array indexes to find the midpoint between
> them can overflow, i.e. code like this is unsafe for big arrays:
>
>         mid = (first + last) >> 1;
>
> Make sure the intermediate value stays within the boundaries instead,
> like this:
>
>         mid = first + ((last - first) >> 1);
>
> The loop condition of the binary search makes sure that 'last' is
> always greater than 'first', so this is safe as long as 'first' is
> not negative.  And that can be verified easily using the pre-context
> of each change, except for name-hash.c, so add an assertion to that
> effect there.

Right, with "safe", one might mean something like "no undefined behavior
due to shifting a signed value with the high bit set". Especially since
we're worrying about overflows, we're obviously having large values in
mind, so we're right to consider the sign bit. But, we're fine as you
note.  Because we subtract, and `last` doesn't have its sign bit set,
and `first` is non-negative and not greater than `last`, the sign bit of
`(last - first)` is always zero.

So all is well. But maybe we should write `(last - first) / 2` anyway.
We could then drop the extra parenthesis, and we would keep future
readers (and static analysis?) from wondering whether we might ever be
shifting a signed value with the sign bit set. A few spots fewer to
audit in the future...

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2
  2019-06-13 19:42 ` Martin Ågren
@ 2019-06-13 21:33   ` René Scharfe
  2019-06-14  4:18     ` Martin Ågren
  0 siblings, 1 reply; 5+ messages in thread
From: René Scharfe @ 2019-06-13 21:33 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Git Mailing List, Derrick Stolee, Jeff Hostetler, Junio C Hamano

Am 13.06.19 um 21:42 schrieb Martin Ågren:
> On Thu, 13 Jun 2019 at 19:54, René Scharfe <l.s.r@web.de> wrote:
>>
>> Calculating the sum of two array indexes to find the midpoint between
>> them can overflow, i.e. code like this is unsafe for big arrays:
>>
>>         mid = (first + last) >> 1;
>>
>> Make sure the intermediate value stays within the boundaries instead,
>> like this:
>>
>>         mid = first + ((last - first) >> 1);
>>
>> The loop condition of the binary search makes sure that 'last' is
>> always greater than 'first', so this is safe as long as 'first' is
>> not negative.  And that can be verified easily using the pre-context
>> of each change, except for name-hash.c, so add an assertion to that
>> effect there.
>
> Right, with "safe", one might mean something like "no undefined behavior
> due to shifting a signed value with the high bit set". Especially since
> we're worrying about overflows, we're obviously having large values in
> mind, so we're right to consider the sign bit. But, we're fine as you
> note.  Because we subtract, and `last` doesn't have its sign bit set,
> and `first` is non-negative and not greater than `last`, the sign bit of
> `(last - first)` is always zero.
>
> So all is well. But maybe we should write `(last - first) / 2` anyway.
> We could then drop the extra parenthesis, and we would keep future
> readers (and static analysis?) from wondering whether we might ever be
> shifting a signed value with the sign bit set. A few spots fewer to
> audit in the future...

Yes, thought about that.  When I saw Clang 8 emitting extra opcodes for
handling the sign for the version with division I decided to restrict
the patch to just do overflow prevention and leave the right shifts in
place..

René

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cleanup: fix possible overflow errors in binary search, part 2
  2019-06-13 21:33   ` René Scharfe
@ 2019-06-14  4:18     ` Martin Ågren
  0 siblings, 0 replies; 5+ messages in thread
From: Martin Ågren @ 2019-06-14  4:18 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git Mailing List, Derrick Stolee, Jeff Hostetler, Junio C Hamano

On Thu, 13 Jun 2019 at 23:33, René Scharfe <l.s.r@web.de> wrote:
>
> Am 13.06.19 um 21:42 schrieb Martin Ågren:
> > On Thu, 13 Jun 2019 at 19:54, René Scharfe <l.s.r@web.de> wrote:

> >> Make sure the intermediate value stays within the boundaries instead,
> >> like this:
> >>
> >>         mid = first + ((last - first) >> 1);
> >>
> >> The loop condition of the binary search makes sure that 'last' is
> >> always greater than 'first', so this is safe as long as 'first' is
> >> not negative.  And that can be verified easily using the pre-context
> >> of each change, except for name-hash.c, so add an assertion to that
> >> effect there.

> > So all is well. But maybe we should write `(last - first) / 2` anyway.
> > We could then drop the extra parenthesis, and we would keep future
> > readers (and static analysis?) from wondering whether we might ever be
> > shifting a signed value with the sign bit set. A few spots fewer to
> > audit in the future...
>
> Yes, thought about that.  When I saw Clang 8 emitting extra opcodes for
> handling the sign for the version with division I decided to restrict
> the patch to just do overflow prevention and leave the right shifts in
> place..

Ah, ok, I should have known you were on top of it. I guess that means
that clang isn't able to figure out we're guaranteed to be working with
non-negative numbers. Which I guess means that it can't be sure the
shift is safe, but with undefined behavior it has the leeway it needs,
so...

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-14  4:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-13 17:51 [PATCH] cleanup: fix possible overflow errors in binary search, part 2 René Scharfe
2019-06-13 18:06 ` Derrick Stolee
2019-06-13 19:42 ` Martin Ågren
2019-06-13 21:33   ` René Scharfe
2019-06-14  4:18     ` Martin Ågren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).