git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] some small range-diff read_patches() fixes
@ 2021-08-09 22:45 Jeff King
  2021-08-09 22:47 ` [PATCH 1/3] range-diff: drop useless "offset" variable from read_patches() Jeff King
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Jeff King @ 2021-08-09 22:45 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer

Amidst all the talk of clang4 in another thread, I noticed that Debian
unstable recently shipped a clang-14 package. So I tried it out, and it
does find one small cleanup. And then looking at the surrounding code
helped me find 2 more. :)

  [1/3]: range-diff: drop useless "offset" variable from read_patches()
  [2/3]: range-diff: handle unterminated lines in read_patches()
  [3/3]: range-diff: use ssize_t for parsed "len" in read_patches()

 range-diff.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] range-diff: drop useless "offset" variable from read_patches()
  2021-08-09 22:45 [PATCH 0/3] some small range-diff read_patches() fixes Jeff King
@ 2021-08-09 22:47 ` Jeff King
  2021-08-09 22:48 ` [PATCH 2/3] range-diff: handle unterminated lines in read_patches() Jeff King
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2021-08-09 22:47 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer

The "offset" variable was was introduced in 44b67cb62b (range-diff:
split lines manually, 2019-07-11), but it has never done anything
useful. We use it to count up the number of bytes we've consumed, but we
never look at the result. It was probably copied accidentally from an
almost-identical loop in apply.c:find_header() (and the point of that
commit was to make use of the parse_git_diff_header() function which
underlies both).

Because the variable was set but not used, most compilers didn't seem to
notice, but the upcoming clang-14 does complain about it, via its
-Wunused-but-set-variable warning.

Signed-off-by: Jeff King <peff@peff.net>
---
The for-loop with an empty initializer and a doubled post-loop operation
is a little funny to see. I didn't see an easy way to make it less ugly
(pushing the line/size initialization into the for() would work, but the
resulting line is awfully long).

 range-diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/range-diff.c b/range-diff.c
index e9479794b4..551600c774 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -49,7 +49,7 @@ static int read_patches(const char *range, struct string_list *list,
 	struct patch_util *util = NULL;
 	int in_header = 1;
 	char *line, *current_filename = NULL;
-	int offset, len;
+	int len;
 	size_t size;
 
 	strvec_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges",
@@ -86,7 +86,7 @@ static int read_patches(const char *range, struct string_list *list,
 
 	line = contents.buf;
 	size = contents.len;
-	for (offset = 0; size > 0; offset += len, size -= len, line += len) {
+	for (; size > 0; size -= len, line += len) {
 		const char *p;
 
 		len = find_end_of_line(line, size);
-- 
2.33.0.rc1.475.g023efe0ae4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] range-diff: handle unterminated lines in read_patches()
  2021-08-09 22:45 [PATCH 0/3] some small range-diff read_patches() fixes Jeff King
  2021-08-09 22:47 ` [PATCH 1/3] range-diff: drop useless "offset" variable from read_patches() Jeff King
@ 2021-08-09 22:48 ` Jeff King
  2021-08-09 22:48 ` [PATCH 3/3] range-diff: use ssize_t for parsed "len" " Jeff King
  2021-08-10 14:47 ` [PATCH 0/3] some small range-diff read_patches() fixes Derrick Stolee
  3 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2021-08-09 22:48 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer

When parsing our buffer of output from git-log, we have a
find_end_of_line() helper that finds the next newline, and gives us the
number of bytes to move past it, or the size of the whole remaining
buffer if there is no newline.

But trying to handle both those cases leads to some oddities:

  - we try to overwrite the newline with NUL in the caller, by writing
    over line[len-1]. This is at best redundant, since the helper will
    already have done so if it saw a newline. But if it didn't see a
    newline, it's actively wrong; we'll overwrite the byte at the end of
    the (unterminated) line.

    We could solve this just dropping the extra NUL assignment in the
    caller and just letting the helper do the right thing. But...

  - if we see a "diff --git" line, we'll restore the newline on top of
    the NUL byte, so we can pass the string to parse_git_diff_header().
    But if there was no newline in the first place, we can't do this.
    There's no place to put it (the current code writes a newline
    over whatever byte we obliterated earlier). The best we can do is
    feed the complete remainder of the buffer to the function (which is,
    in fact, a string, by virtue of being a strbuf).

To solve this, the caller needs to know whether we actually found a
newline or not. We could modify find_end_of_line() to return that
information, but we can further observe that it has only one caller.
So let's just inline it in that caller.

Nobody seems to have noticed this case, probably because git-log would
never produce input that doesn't end with a newline. Arguably we could
just return an error as soon as we see that the output does not end in a
newline. But the code to do so actually ends up _longer_, mostly because
of the cleanup we have to do in handling the error.

Signed-off-by: Jeff King <peff@peff.net>
---
I had initially hoped to just delete the redundant line, but the details
and the explanation got surprisingly tricky (especially for something
that we don't expect to see in practice anyway). Sorry. :)

 range-diff.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/range-diff.c b/range-diff.c
index 551600c774..088db1b1ce 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -26,17 +26,6 @@ struct patch_util {
 	struct object_id oid;
 };
 
-static size_t find_end_of_line(char *buffer, unsigned long size)
-{
-	char *eol = memchr(buffer, '\n', size);
-
-	if (!eol)
-		return size;
-
-	*eol = '\0';
-	return eol + 1 - buffer;
-}
-
 /*
  * Reads the patches into a string list, with the `util` field being populated
  * as struct object_id (will need to be free()d).
@@ -88,9 +77,16 @@ static int read_patches(const char *range, struct string_list *list,
 	size = contents.len;
 	for (; size > 0; size -= len, line += len) {
 		const char *p;
+		char *eol;
+
+		eol = memchr(line, '\n', size);
+		if (eol) {
+			*eol = '\0';
+			len = eol + 1 - line;
+		} else {
+			len = size;
+		}
 
-		len = find_end_of_line(line, size);
-		line[len - 1] = '\0';
 		if (skip_prefix(line, "commit ", &p)) {
 			if (util) {
 				string_list_append(list, buf.buf)->util = util;
@@ -132,7 +128,8 @@ static int read_patches(const char *range, struct string_list *list,
 			strbuf_addch(&buf, '\n');
 			if (!util->diff_offset)
 				util->diff_offset = buf.len;
-			line[len - 1] = '\n';
+			if (eol)
+				*eol = '\n';
 			orig_len = len;
 			len = parse_git_diff_header(&root, &linenr, 0, line,
 						    len, size, &patch);
-- 
2.33.0.rc1.475.g023efe0ae4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] range-diff: use ssize_t for parsed "len" in read_patches()
  2021-08-09 22:45 [PATCH 0/3] some small range-diff read_patches() fixes Jeff King
  2021-08-09 22:47 ` [PATCH 1/3] range-diff: drop useless "offset" variable from read_patches() Jeff King
  2021-08-09 22:48 ` [PATCH 2/3] range-diff: handle unterminated lines in read_patches() Jeff King
@ 2021-08-09 22:48 ` Jeff King
  2021-08-10 14:47 ` [PATCH 0/3] some small range-diff read_patches() fixes Derrick Stolee
  3 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2021-08-09 22:48 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer

As we iterate through the buffer containing git-log output, parsing
lines, we use an "int" to store the size of an individual line. This
should be a size_t, as we have no guarantee that there is not a
malicious 2GB+ commit-message line in the output.

Overflowing this integer probably doesn't do anything _too_ terrible. We
are not using the value to size a buffer, so the worst case is probably
an out-of-bounds read from before the array. But it's easy enough to
fix.

Note that we have to use ssize_t here, since we also store the length
result from parse_git_diff_header(), which may return a negative value
for error. That function actually returns an int itself, which has a
similar overflow problem, but I'll leave that for another day. Much
of the apply.c code uses ints and should be converted as a whole; in the
meantime, a negative return from parse_git_diff_header() will be
interpreted as an error, and we'll bail (so we can't handle such a case,
but given that it's likely to be malicious anyway, the important thing
is we don't have any memory errors).

Signed-off-by: Jeff King <peff@peff.net>
---
 range-diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/range-diff.c b/range-diff.c
index 088db1b1ce..e731525e66 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -38,7 +38,7 @@ static int read_patches(const char *range, struct string_list *list,
 	struct patch_util *util = NULL;
 	int in_header = 1;
 	char *line, *current_filename = NULL;
-	int len;
+	ssize_t len;
 	size_t size;
 
 	strvec_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges",
-- 
2.33.0.rc1.475.g023efe0ae4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] some small range-diff read_patches() fixes
  2021-08-09 22:45 [PATCH 0/3] some small range-diff read_patches() fixes Jeff King
                   ` (2 preceding siblings ...)
  2021-08-09 22:48 ` [PATCH 3/3] range-diff: use ssize_t for parsed "len" " Jeff King
@ 2021-08-10 14:47 ` Derrick Stolee
  2021-08-14 21:48   ` Johannes Schindelin
  3 siblings, 1 reply; 6+ messages in thread
From: Derrick Stolee @ 2021-08-10 14:47 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Thomas Gummerer

On 8/9/2021 6:45 PM, Jeff King wrote:
> Amidst all the talk of clang4 in another thread, I noticed that Debian
> unstable recently shipped a clang-14 package. So I tried it out, and it
> does find one small cleanup. And then looking at the surrounding code
> helped me find 2 more. :)
> 
>   [1/3]: range-diff: drop useless "offset" variable from read_patches()
>   [2/3]: range-diff: handle unterminated lines in read_patches()
>   [3/3]: range-diff: use ssize_t for parsed "len" in read_patches()

I gave these a read. The code diffs are obviously correct and the
explanations are well motivated. Thanks.

-Stolee

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] some small range-diff read_patches() fixes
  2021-08-10 14:47 ` [PATCH 0/3] some small range-diff read_patches() fixes Derrick Stolee
@ 2021-08-14 21:48   ` Johannes Schindelin
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Schindelin @ 2021-08-14 21:48 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jeff King, git, Thomas Gummerer

Hi,

On Tue, 10 Aug 2021, Derrick Stolee wrote:

> On 8/9/2021 6:45 PM, Jeff King wrote:
> > Amidst all the talk of clang4 in another thread, I noticed that Debian
> > unstable recently shipped a clang-14 package. So I tried it out, and it
> > does find one small cleanup. And then looking at the surrounding code
> > helped me find 2 more. :)
> >
> >   [1/3]: range-diff: drop useless "offset" variable from read_patches()
> >   [2/3]: range-diff: handle unterminated lines in read_patches()
> >   [3/3]: range-diff: use ssize_t for parsed "len" in read_patches()
>
> I gave these a read. The code diffs are obviously correct and the
> explanations are well motivated. Thanks.

Same here. Thanks, both,
Dscho

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-14 21:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09 22:45 [PATCH 0/3] some small range-diff read_patches() fixes Jeff King
2021-08-09 22:47 ` [PATCH 1/3] range-diff: drop useless "offset" variable from read_patches() Jeff King
2021-08-09 22:48 ` [PATCH 2/3] range-diff: handle unterminated lines in read_patches() Jeff King
2021-08-09 22:48 ` [PATCH 3/3] range-diff: use ssize_t for parsed "len" " Jeff King
2021-08-10 14:47 ` [PATCH 0/3] some small range-diff read_patches() fixes Derrick Stolee
2021-08-14 21:48   ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).