git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] common_prefix: be more careful about pathspec bounds
@ 2010-06-10 18:24 Thomas Rast
  2010-06-15  8:16 ` Thomas Rast
  2010-06-15 15:05 ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Rast @ 2010-06-10 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

common_prefix() scans backwards from the far end of each 'next'
pathspec, starting from 'len', shortening the 'prefix' using 'path' as
a reference.

However, there was a small opportunity for an out-of-bounds access:
len is unconditionally set to prefix-1 after a "direct match" test
failed.  This means that if 'next' is shorter than prefix+2, we read
past it.

Normally this won't be a problem, which is probably why nobody has
noticed that this was broken since 2006.  Even if we find a slash
somewhere beyond the actual contents of 'next', the memcmp after it
can never match because of the terminating NUL.  However, if we are
unlucky and 'next' is right before a page boundary, we may not have
any read access beyond it.

To fix, only set len to prefix-1 if that is actually inside 'next',
i.e., reduces the available length.  As explained in the last
paragraph, increasing the length never results in more matches.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---

Found by valgrind.

 dir.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 5615f33..ca689ff 100644
--- a/dir.c
+++ b/dir.c
@@ -34,9 +34,11 @@ static int common_prefix(const char **pathspec)
 	prefix = slash - path + 1;
 	while ((next = *++pathspec) != NULL) {
 		int len = strlen(next);
-		if (len >= prefix && !memcmp(path, next, prefix))
-			continue;
-		len = prefix - 1;
+		if (len >= prefix) {
+			if (!memcmp(path, next, prefix))
+				continue;
+			len = prefix - 1;
+		}
 		for (;;) {
 			if (!len)
 				return 0;
-- 
1.7.1.553.ga798e

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] common_prefix: be more careful about pathspec bounds
  2010-06-10 18:24 [PATCH] common_prefix: be more careful about pathspec bounds Thomas Rast
@ 2010-06-15  8:16 ` Thomas Rast
  2010-06-15  9:05   ` Johannes Schindelin
  2010-06-15 15:05 ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Rast @ 2010-06-15  8:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git, Linus Torvalds

Thomas Rast wrote:
> Normally this won't be a problem, which is probably why nobody has
> noticed that this was broken since 2006.

Actually I just noticed it's not from back in 2006 when Linus wrote
this code, but from the following:

commit c7f34c180b7117cf60ad12a8b180eed33716e390
Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Date:   Mon Apr 23 10:21:25 2007 +0200

    dir.c(common_prefix): Fix two bugs
    
    The function common_prefix() is used to find the common subdirectory of
    a couple of pathnames. When checking if the next pathname matches up with
    the prefix, it incorrectly checked the whole path, not just the prefix
    (including the slash). Thus, the expensive part of the loop was executed
    always.
    
    The other bug is more serious: if the first and the last pathname in the
    list have a longer common prefix than the common prefix for _all_ pathnames
    in the list, the longer one would be chosen. This bug was probably hidden
    by the fact that bash's wildcard expansion sorts the results, and the code
    just so happens to work with sorted input.
[...]
--- a/dir.c
+++ b/dir.c
@@ -24,8 +24,9 @@ int common_prefix(const char **pathspec)
        prefix = slash - path + 1;
        while ((next = *++pathspec) != NULL) {
                int len = strlen(next);
-               if (len >= prefix && !memcmp(path, next, len))
+               if (len >= prefix && !memcmp(path, next, prefix))
                        continue;
+               len = prefix - 1;
                for (;;) {
                        if (!len)
                                return 0;

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] common_prefix: be more careful about pathspec bounds
  2010-06-15  8:16 ` Thomas Rast
@ 2010-06-15  9:05   ` Johannes Schindelin
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2010-06-15  9:05 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Junio C Hamano, git, Linus Torvalds

Hi,

On Tue, 15 Jun 2010, Thomas Rast wrote:

> Thomas Rast wrote:
> > Normally this won't be a problem, which is probably why nobody has
> > noticed that this was broken since 2006.
> 
> Actually I just noticed it's not from back in 2006 when Linus wrote
> this code, but from the following:
> 
> commit c7f34c180b7117cf60ad12a8b180eed33716e390
> Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> Date:   Mon Apr 23 10:21:25 2007 +0200
> 
>     dir.c(common_prefix): Fix two bugs

Sorry, my Git time budget for this week was 0 hours, so I used that up 
quite thoroughly. Besides, I do not know anything about the original 
issue, so there is little chance that I even would know where to start.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] common_prefix: be more careful about pathspec bounds
  2010-06-10 18:24 [PATCH] common_prefix: be more careful about pathspec bounds Thomas Rast
  2010-06-15  8:16 ` Thomas Rast
@ 2010-06-15 15:05 ` Junio C Hamano
  2010-06-15 16:06   ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2010-06-15 15:05 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Linus Torvalds

Thomas Rast <trast@student.ethz.ch> writes:

> common_prefix() scans backwards from the far end of each 'next'
> pathspec, starting from 'len', shortening the 'prefix' using 'path' as
> a reference.
>
> However, there was a small opportunity for an out-of-bounds access:
> len is unconditionally set to prefix-1 after a "direct match" test
> failed.  This means that if 'next' is shorter than prefix+2, we read
> past it.
> ...
> Found by valgrind.
>
>  dir.c |    8 +++++---
>  1 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 5615f33..ca689ff 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -34,9 +34,11 @@ static int common_prefix(const char **pathspec)
>  	prefix = slash - path + 1;
>  	while ((next = *++pathspec) != NULL) {
>  		int len = strlen(next);
> -		if (len >= prefix && !memcmp(path, next, prefix))
> -			continue;
> -		len = prefix - 1;
> +		if (len >= prefix) {
> +			if (!memcmp(path, next, prefix))
> +				continue;
> +			len = prefix - 1;
> +		}
>  		for (;;) {
>  			if (!len)
>  				return 0;

The structure of this loop is somewhat curious.  It starts out by setting
prefix based on what is found in "path" (i.e. the first proposed common
prefix is the longest leading directory path of "path"), and when it finds
that the prefix being considered does not match "next", it uses what is
found in "next" to shorten it.

Isn't it more intuitive to structure the loop by saying 'Ok, if "path" up
to the currently proposed "prefix" is too long to match, let's shorten it
by one path component and try again'?  IOW, something like...

static int common_prefix(const char **pathspec)
{
	const char *path, *slash, *next;
	int prefix;

	if (!pathspec)
		return 0;

	path = *pathspec;
	slash = strrchr(path, '/');
	if (!slash)
		return 0;

	prefix = slash - path + 1;
	while ((next = *++pathspec) != NULL) {
		int len;
	again:
		len = strlen(next);
		if (len > prefix && !memcmp(path, next, prefix))
			continue;
		while (0 < --prefix && path[prefix - 1] != '/')
			;
		if (!prefix)
			break;
		goto again;
	}
	return prefix;
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] common_prefix: be more careful about pathspec bounds
  2010-06-15 15:05 ` Junio C Hamano
@ 2010-06-15 16:06   ` Junio C Hamano
  2010-06-15 18:04     ` Thomas Rast
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2010-06-15 16:06 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Linus Torvalds

Junio C Hamano <gitster@pobox.com> writes:

> Isn't it more intuitive to structure the loop by saying 'Ok, if "path" up
> to the currently proposed "prefix" is too long to match, let's shorten it
> by one path component and try again'?

Another way of saying this, which probably needs less number of scans,
would be to shorten prefix to what is known to match --- use of memcmp()
discards this information.

static int common_prefix(const char **pathspec)
{
	const char *path, *slash, *next;
	int prefix;

	if (!pathspec)
		return 0;

	path = *pathspec;
	slash = strrchr(path, '/');
	if (!slash)
		return 0;
	prefix = slash - path + 1;

	while ((next = *++pathspec) != NULL) {
		int len, last_matching_slash = -1;
		for (len = 0;
		     len < prefix && next[len] == path[len];
		     len++)
			if (next[len] == '/')
				last_matching_slash = len;
		if (len == prefix)
			continue;
		if (last_matching_slash < 0)
			return 0;
		prefix = last_matching_slash + 1;
	}
	return prefix;
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] common_prefix: be more careful about pathspec bounds
  2010-06-15 16:06   ` Junio C Hamano
@ 2010-06-15 18:04     ` Thomas Rast
  2010-06-15 22:12       ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Rast @ 2010-06-15 18:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Isn't it more intuitive to structure the loop by saying 'Ok, if "path" up
> > to the currently proposed "prefix" is too long to match, let's shorten it
> > by one path component and try again'?
> 
> Another way of saying this, which probably needs less number of scans,
> would be to shorten prefix to what is known to match --- use of memcmp()
> discards this information.
[...]
> 	while ((next = *++pathspec) != NULL) {
> 		int len, last_matching_slash = -1;
> 		for (len = 0; len < prefix && next[len] == path[len]; len++)
> 			if (next[len] == '/')
> 				last_matching_slash = len;
> 		if (len == prefix)
> 			continue;
> 		if (last_matching_slash < 0)
> 			return 0;
> 		prefix = last_matching_slash + 1;
> 	}
> 	return prefix;
> }

I really didn't like the two-interleaved-loops version in your last
mail, but this one is way more readable than even the original.

(Why did you wrap the for loop? It's only 76 chars.)

Maybe you could add a comment like

> 	slash = strrchr(path, '/');
> 	if (!slash)
> 		return 0;
	/* The first 'prefix' characters of 'path' always include the
	   trailing slash of the prefix directory */
> 	prefix = slash - path + 1;

to clean up any doubts about the correctness of the matching.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] common_prefix: be more careful about pathspec bounds
  2010-06-15 18:04     ` Thomas Rast
@ 2010-06-15 22:12       ` Junio C Hamano
  2010-06-15 23:02         ` [PATCH] common_prefix: simplify and fix scanning for prefixes Thomas Rast
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2010-06-15 22:12 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Linus Torvalds

Thomas Rast <trast@student.ethz.ch> writes:
> I really didn't like the two-interleaved-loops version in your last
> mail, but this one is way more readable than even the original.
>
> (Why did you wrap the for loop? It's only 76 chars.)

Because I was writing it in my MUA ;-)

>
> Maybe you could add a comment like
>
>> 	slash = strrchr(path, '/');
>> 	if (!slash)
>> 		return 0;
> 	/* The first 'prefix' characters of 'path' always include the
> 	   trailing slash of the prefix directory */
>> 	prefix = slash - path + 1;
>
> to clean up any doubts about the correctness of the matching.

Perhaps..

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] common_prefix: simplify and fix scanning for prefixes
  2010-06-15 22:12       ` Junio C Hamano
@ 2010-06-15 23:02         ` Thomas Rast
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Rast @ 2010-06-15 23:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Junio C Hamano

From: Junio C Hamano <gitster@pobox.com>

common_prefix() scans backwards from the far end of each 'next'
pathspec, starting from 'len', shortening the 'prefix' using 'path' as
a reference.

However, there is a small opportunity for an out-of-bounds access
because len is unconditionally set to prefix-1 after a "direct match"
test failed.  This means that if 'next' is shorter than prefix+2, we
read past it.

Instead of a minimal fix, simplify the loop: scan *forward* over the
'next' entry, remembering the last '/' where it matched the prefix
known so far.  This is far easier to read and also has the advantage
that we only scan over each entry once.

Acked-by: Thomas Rast <trast@student.ethz.ch>
---

Junio C Hamano wrote:
> Thomas Rast <trast@student.ethz.ch> writes:
> > I really didn't like the two-interleaved-loops version in your last
> > mail, but this one is way more readable than even the original.
> >
> > (Why did you wrap the for loop? It's only 76 chars.)
> 
> Because I was writing it in my MUA ;-)

Well then, perhaps I can at least repay the favour by suggesting a
commit message.


 dir.c |   21 ++++++++-------------
 1 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/dir.c b/dir.c
index 5e36f8e..78eb869 100644
--- a/dir.c
+++ b/dir.c
@@ -33,20 +33,15 @@ static int common_prefix(const char **pathspec)
 
 	prefix = slash - path + 1;
 	while ((next = *++pathspec) != NULL) {
-		int len = strlen(next);
-		if (len >= prefix && !memcmp(path, next, prefix))
+		int len, last_matching_slash = -1;
+		for (len = 0; len < prefix && next[len] == path[len]; len++)
+			if (next[len] == '/')
+				last_matching_slash = len;
+		if (len == prefix)
 			continue;
-		len = prefix - 1;
-		for (;;) {
-			if (!len)
-				return 0;
-			if (next[--len] != '/')
-				continue;
-			if (memcmp(path, next, len+1))
-				continue;
-			prefix = len + 1;
-			break;
-		}
+		if (last_matching_slash < 0)
+			return 0;
+		prefix = last_matching_slash + 1;
 	}
 	return prefix;
 }
-- 
1.7.1.608.g80d39f

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-06-15 23:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-10 18:24 [PATCH] common_prefix: be more careful about pathspec bounds Thomas Rast
2010-06-15  8:16 ` Thomas Rast
2010-06-15  9:05   ` Johannes Schindelin
2010-06-15 15:05 ` Junio C Hamano
2010-06-15 16:06   ` Junio C Hamano
2010-06-15 18:04     ` Thomas Rast
2010-06-15 22:12       ` Junio C Hamano
2010-06-15 23:02         ` [PATCH] common_prefix: simplify and fix scanning for prefixes Thomas Rast

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).