git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Use of strbuf.buf when strbuf.len == 0
@ 2007-09-27  6:21 Junio C Hamano
  2007-09-27 10:13 ` Pierre Habouzit
  2007-09-27 11:22 ` Pierre Habouzit
  0 siblings, 2 replies; 10+ messages in thread
From: Junio C Hamano @ 2007-09-27  6:21 UTC (permalink / raw)
  To: git

I am starting to hate some parts of the strbuf API.

Here is an example.  Can you spot what goes wrong?

static int handle_file(const char *path,
	 unsigned char *sha1, const char *output)
{
	SHA_CTX ctx;
	char buf[1024];
	int hunk = 0, hunk_no = 0;
	struct strbuf one, two;
	...
	if (sha1)
		SHA1_Init(&ctx);

	strbuf_init(&one, 0);
	strbuf_init(&two,  0);
	while (fgets(buf, sizeof(buf), f)) {
		if (!prefixcmp(buf, "<<<<<<< "))
			hunk = 1;
		else if (!prefixcmp(buf, "======="))
			hunk = 2;
		else if (!prefixcmp(buf, ">>>>>>> ")) {
			int cmp = strbuf_cmp(&one, &two);

			hunk_no++;
			hunk = 0;
			if (cmp > 0) {
				strbuf_swap(&one, &two);
			}
			if (out) {
				fputs("<<<<<<<\n", out);
				fwrite(one.buf, one.len, 1, out);
				fputs("=======\n", out);
				fwrite(two.buf, two.len, 1, out);
				fputs(">>>>>>>\n", out);
			}
			if (sha1) {
				SHA1_Update(&ctx, one.buf, one.len + 1);
				SHA1_Update(&ctx, two.buf, two.len + 1);
			}
			strbuf_reset(&one);
			strbuf_reset(&two);
		} else if (hunk == 1)
			strbuf_addstr(&one, buf);
		else if (hunk == 2)
			strbuf_addstr(&two, buf);
		else if (out)
			fputs(buf, out);
	}
	strbuf_release(&one);
	strbuf_release(&two);
	...
}

When one or two are empty and the caller asked for checksumming,
the code still relies on one.buf being an allocated memory with
an extra NUL termination and tries to feed the NULL pointer to
SHA1_Update() with length of 1!

An obvious workaround is to say "strbuf_init(&one, !!sha1)" to
force the allocation.  However, because the second parameter to
strbuf_init() is defined to be merely a hint, we should not rely
on strbuf_init() with non-zero hint to have allocation from the
beginning.  It is assuming too much.

A more defensive way would be for the user to do something like:

	SHA1_Update(&ctx, one.buf ? one.buf : "", one.len + 1);
	SHA1_Update(&ctx, two.buf ? two.buf : "", two.len + 1);

which I am leaning towards, but this looks ugly.

I was already bitten by another bug in strbuf_setlen() that
shared the same roots; an empty strbuf can have two internal
representations ("alloc == 0 && buf == NULL" vs "alloc != 0 &&
buf != NULL") and they behave differently.

For example, this happens to be Ok but the validity of it is
subtle.  If a or b is empty, memcmp may get a NULL pointer but
we ask only 0 byte comparison.

        int strbuf_cmp(struct strbuf *a, struct strbuf *b)
        {
                int cmp;
                if (a->len < b->len) {
                        cmp = memcmp(a->buf, b->buf, a->len);
                        return cmp ? cmp : -1;
                } else {
                        cmp = memcmp(a->buf, b->buf, b->len);
                        return cmp ? cmp : a->len != b->len;
                }
        }

It might be an easier and safer fix to define that strbuf_init()
to always have allocation.  Use of a strbuf in the code _and_
not adding any contents to the buffer should be an exception and
avoiding malloc()/free() for that special case feels optimizing
for the wrong case.

However, there are strbuf instances that are not initialized
(i.e. in BSS or initialized by declaring with STRBUF_INIT), so
we still need to handle (.len == 0 && .alloc == 0) case
specially anyway.

It would be appreciated if somebody with a fresh pair of eyes
can go over the strbuf series one more time to make sure that we
do not try to blindly use buf.buf, assuming buf.buf[0] is NUL if
(buf.len == 0).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Use of strbuf.buf when strbuf.len == 0
  2007-09-27  6:21 Use of strbuf.buf when strbuf.len == 0 Junio C Hamano
@ 2007-09-27 10:13 ` Pierre Habouzit
  2007-09-27 10:51   ` [PATCH 1/2] double free in builtin-update-index.c Pierre Habouzit
                     ` (2 more replies)
  2007-09-27 11:22 ` Pierre Habouzit
  1 sibling, 3 replies; 10+ messages in thread
From: Pierre Habouzit @ 2007-09-27 10:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1345 bytes --]

On Thu, Sep 27, 2007 at 06:21:24AM +0000, Junio C Hamano wrote:
> It might be an easier and safer fix to define that strbuf_init()
> to always have allocation.  Use of a strbuf in the code _and_
> not adding any contents to the buffer should be an exception and
> avoiding malloc()/free() for that special case feels optimizing
> for the wrong case.
> 
> However, there are strbuf instances that are not initialized
> (i.e. in BSS or initialized by declaring with STRBUF_INIT), so
> we still need to handle (.len == 0 && .alloc == 0) case
> specially anyway.

  I can see a way, that would need special proof-reading of the strbuf
module, but should not harm its users, that would be to change
STRBUF_INIT to work this way:

  { .buf = "", .len = 0, .alloc = 0 }

  It needs to make strbuf_grow and strbuf_release check for ->alloc
before doing anything stupid.

  Though we may have some bits of code that rely on .buf being NULL if
nothing happened. I tried to track them down, but some may remain.

  If you agree with this change, that would solve most of the issues
with almost no cost, then I'll propose a new patch with this change.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] double free in builtin-update-index.c
  2007-09-27 10:13 ` Pierre Habouzit
@ 2007-09-27 10:51   ` Pierre Habouzit
  2007-09-27 10:58   ` [PATCH 2/2] strbuf change: be sure ->buf is never ever NULL Pierre Habouzit
  2007-09-29  0:51   ` Use of strbuf.buf when strbuf.len == 0 Linus Torvalds
  2 siblings, 0 replies; 10+ messages in thread
From: Pierre Habouzit @ 2007-09-27 10:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

path_name is either ptr that should not be freed, or a pointer to a strbuf
buffer that is deallocated when exiting the loop. Don't do that !

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
---
 builtin-update-index.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/builtin-update-index.c b/builtin-update-index.c
index c76879e..e1a938d 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -377,8 +377,6 @@ static void read_index_info(int line_termination)
 				die("git-update-index: unable to update %s",
 				    path_name);
 		}
-		if (path_name != ptr)
-			free(path_name);
 		continue;
 
 	bad_line:
-- 
1.5.3.2.1100.g015a-dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] strbuf change: be sure ->buf is never ever NULL.
  2007-09-27 10:13 ` Pierre Habouzit
  2007-09-27 10:51   ` [PATCH 1/2] double free in builtin-update-index.c Pierre Habouzit
@ 2007-09-27 10:58   ` Pierre Habouzit
  2007-09-29  0:51   ` Use of strbuf.buf when strbuf.len == 0 Linus Torvalds
  2 siblings, 0 replies; 10+ messages in thread
From: Pierre Habouzit @ 2007-09-27 10:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.

strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.

as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.

API changes:
  * strbuf_setlen now always works, so just make strbuf_reset a convenience
    macro.
  * strbuf_detatch takes a size_t* optional argument (meaning it can be
    NULL) to copy the buffer's len, as it was needed for this refactor to
    make the code more readable, and working like the callers.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
---

I've written this patch using the following method:

(1) I've renamed the ->buf member into sb_buf (I grepped before it wasn't
    a string used in the project) and made the changes I describe in
    strbuf.[hc]

(2) I've built the project, and renamed every ->buf into sb_buf, and
    applied the needed semantics changes. It's doing that that I found the
    issue I fix in the patch 1/2.

(3) I've then run the testsuite, it passes.

(4) I've sed -i -e 's/\<sb_buf\>/buf/g' *.h *.c


Nobody was directly using the fact that a strbuf that wasn't touched had
its pointer NULL, though people detach'ing them do, and strbuf_detach
complies with that. That's why I think, despite the somehow tasteless
"strbuf_slopbuf" it's the best way to go.

 builtin-apply.c       |   16 +++++++---------
 builtin-archive.c     |    5 ++---
 builtin-fetch--tool.c |    2 +-
 commit.c              |    2 +-
 convert.c             |    4 ++--
 diff.c                |   14 ++++++--------
 entry.c               |    3 +--
 fast-import.c         |    2 +-
 imap-send.c           |    2 +-
 quote.c               |    2 +-
 sha1_file.c           |    3 +--
 strbuf.c              |   27 ++++++++++++++++-----------
 strbuf.h              |   10 +++++-----
 13 files changed, 45 insertions(+), 47 deletions(-)

diff --git a/builtin-apply.c b/builtin-apply.c
index 450f0a8..833b142 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -178,14 +178,13 @@ static void say_patch_name(FILE *output, const char *pre, struct patch *patch, c
 #define CHUNKSIZE (8192)
 #define SLOP (16)
 
-static void *read_patch_file(int fd, unsigned long *sizep)
+static void *read_patch_file(int fd, size_t *sizep)
 {
 	struct strbuf buf;
 
 	strbuf_init(&buf, 0);
 	if (strbuf_read(&buf, fd, 0) < 0)
 		die("git-apply: read returned %s", strerror(errno));
-	*sizep = buf.len;
 
 	/*
 	 * Make sure that we have some slop in the buffer
@@ -194,7 +193,7 @@ static void *read_patch_file(int fd, unsigned long *sizep)
 	 */
 	strbuf_grow(&buf, SLOP);
 	memset(buf.buf + buf.len, 0, SLOP);
-	return strbuf_detach(&buf);
+	return strbuf_detach(&buf, sizep);
 }
 
 static unsigned long linelen(const char *buffer, unsigned long size)
@@ -253,7 +252,7 @@ static char *find_name(const char *line, char *def, int p_value, int terminate)
 				 */
 				strbuf_remove(&name, 0, cp - name.buf);
 				free(def);
-				return name.buf;
+				return strbuf_detach(&name, NULL);
 			}
 		}
 		strbuf_release(&name);
@@ -607,7 +606,7 @@ static char *git_header_name(char *line, int llen)
 			if (strcmp(cp + 1, first.buf))
 				goto free_and_fail1;
 			strbuf_release(&sp);
-			return first.buf;
+			return strbuf_detach(&first, NULL);
 		}
 
 		/* unquoted second */
@@ -618,7 +617,7 @@ static char *git_header_name(char *line, int llen)
 		if (line + llen - cp != first.len + 1 ||
 		    memcmp(first.buf, cp, first.len))
 			goto free_and_fail1;
-		return first.buf;
+		return strbuf_detach(&first, NULL);
 
 	free_and_fail1:
 		strbuf_release(&first);
@@ -655,7 +654,7 @@ static char *git_header_name(char *line, int llen)
 			    isspace(name[len])) {
 				/* Good */
 				strbuf_remove(&sp, 0, np - sp.buf);
-				return sp.buf;
+				return strbuf_detach(&sp, NULL);
 			}
 
 		free_and_fail2:
@@ -1968,8 +1967,7 @@ static int apply_data(struct patch *patch, struct stat *st, struct cache_entry *
 
 	if (apply_fragments(&buf, patch) < 0)
 		return -1; /* note with --reject this succeeds. */
-	patch->result = buf.buf;
-	patch->resultsize = buf.len;
+	patch->result = strbuf_detach(&buf, &patch->resultsize);
 
 	if (0 < patch->is_delete && patch->resultsize)
 		return error("removal patch leaves file contents");
diff --git a/builtin-archive.c b/builtin-archive.c
index 843a9e3..04385de 100644
--- a/builtin-archive.c
+++ b/builtin-archive.c
@@ -89,7 +89,7 @@ static void format_subst(const struct commit *commit,
 	struct strbuf fmt;
 
 	if (src == buf->buf)
-		to_free = strbuf_detach(buf);
+		to_free = strbuf_detach(buf, NULL);
 	strbuf_init(&fmt, 0);
 	for (;;) {
 		const char *b, *c;
@@ -153,8 +153,7 @@ void *sha1_file_to_archive(const char *path, const unsigned char *sha1,
 		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
 		convert_to_working_tree(path, buf.buf, buf.len, &buf);
 		convert_to_archive(path, buf.buf, buf.len, &buf, commit);
-		*sizep = buf.len;
-		buffer = buf.buf;
+		buffer = strbuf_detach(&buf, sizep);
 	}
 
 	return buffer;
diff --git a/builtin-fetch--tool.c b/builtin-fetch--tool.c
index 349b59c..1e43d79 100644
--- a/builtin-fetch--tool.c
+++ b/builtin-fetch--tool.c
@@ -10,7 +10,7 @@ static char *get_stdin(void)
 	if (strbuf_read(&buf, 0, 1024) < 0) {
 		die("error reading standard input: %s", strerror(errno));
 	}
-	return strbuf_detach(&buf);
+	return strbuf_detach(&buf, NULL);
 }
 
 static void show_new(enum object_type type, unsigned char *sha1_new)
diff --git a/commit.c b/commit.c
index 1e391e6..20fb220 100644
--- a/commit.c
+++ b/commit.c
@@ -663,7 +663,7 @@ static char *replace_encoding_header(char *buf, const char *encoding)
 					  len - strlen("encoding \n"),
 					  encoding, strlen(encoding));
 	}
-	return tmp.buf;
+	return strbuf_detach(&tmp, NULL);
 }
 
 static char *logmsg_reencode(const struct commit *commit,
diff --git a/convert.c b/convert.c
index 79c9df2..0d5e909 100644
--- a/convert.c
+++ b/convert.c
@@ -168,7 +168,7 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
 
 	/* are we "faking" in place editing ? */
 	if (src == buf->buf)
-		to_free = strbuf_detach(buf);
+		to_free = strbuf_detach(buf, NULL);
 
 	strbuf_grow(buf, len + stats.lf - stats.crlf);
 	for (;;) {
@@ -464,7 +464,7 @@ static int ident_to_worktree(const char *path, const char *src, size_t len,
 
 	/* are we "faking" in place editing ? */
 	if (src == buf->buf)
-		to_free = strbuf_detach(buf);
+		to_free = strbuf_detach(buf, NULL);
 	hash_sha1_file(src, len, "blob", sha1);
 
 	strbuf_grow(buf, len + cnt * 43);
diff --git a/diff.c b/diff.c
index 4a7f1e1..0bd7e24 100644
--- a/diff.c
+++ b/diff.c
@@ -197,7 +197,7 @@ static char *quote_two(const char *one, const char *two)
 		strbuf_addstr(&res, one);
 		strbuf_addstr(&res, two);
 	}
-	return res.buf;
+	return strbuf_detach(&res, NULL);
 }
 
 static const char *external_diff(void)
@@ -662,7 +662,7 @@ static char *pprint_rename(const char *a, const char *b)
 		quote_c_style(a, &name, NULL, 0);
 		strbuf_addstr(&name, " => ");
 		quote_c_style(b, &name, NULL, 0);
-		return name.buf;
+		return strbuf_detach(&name, NULL);
 	}
 
 	/* Find common prefix */
@@ -710,7 +710,7 @@ static char *pprint_rename(const char *a, const char *b)
 		strbuf_addch(&name, '}');
 		strbuf_add(&name, a + len_a - sfx_length, sfx_length);
 	}
-	return name.buf;
+	return strbuf_detach(&name, NULL);
 }
 
 struct diffstat_t {
@@ -827,7 +827,7 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options)
 			strbuf_init(&buf, 0);
 			if (quote_c_style(file->name, &buf, NULL, 0)) {
 				free(file->name);
-				file->name = buf.buf;
+				file->name = strbuf_detach(&buf, NULL);
 			} else {
 				strbuf_release(&buf);
 			}
@@ -1519,8 +1519,7 @@ static int populate_from_stdin(struct diff_filespec *s)
 				     strerror(errno));
 
 	s->should_munmap = 0;
-	s->size = buf.len;
-	s->data = strbuf_detach(&buf);
+	s->data = strbuf_detach(&buf, &s->size);
 	s->should_free = 1;
 	return 0;
 }
@@ -1612,8 +1611,7 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
 		if (convert_to_git(s->path, s->data, s->size, &buf)) {
 			munmap(s->data, s->size);
 			s->should_munmap = 0;
-			s->data = buf.buf;
-			s->size = buf.len;
+			s->data = strbuf_detach(&buf, &s->size);
 			s->should_free = 1;
 		}
 	}
diff --git a/entry.c b/entry.c
index 4a8c73b..98f5f6d 100644
--- a/entry.c
+++ b/entry.c
@@ -120,8 +120,7 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
 		strbuf_init(&buf, 0);
 		if (convert_to_working_tree(ce->name, new, size, &buf)) {
 			free(new);
-			new = buf.buf;
-			size = buf.len;
+			new = strbuf_detach(&buf, &size);
 		}
 
 		if (to_tempfile) {
diff --git a/fast-import.c b/fast-import.c
index a870a44..e9c80be 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1562,7 +1562,7 @@ static int read_next_command(void)
 		} else {
 			struct recent_command *rc;
 
-			strbuf_detach(&command_buf);
+			strbuf_detach(&command_buf, NULL);
 			stdin_eof = strbuf_getline(&command_buf, stdin, '\n');
 			if (stdin_eof)
 				return EOF;
diff --git a/imap-send.c b/imap-send.c
index e95cdde..a429a76 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1180,7 +1180,7 @@ read_message( FILE *f, msg_data_t *msg )
 	} while (!feof(f));
 
 	msg->len  = buf.len;
-	msg->data = strbuf_detach(&buf);
+	msg->data = strbuf_detach(&buf, NULL);
 	return msg->len;
 }
 
diff --git a/quote.c b/quote.c
index 800fd88..482be05 100644
--- a/quote.c
+++ b/quote.c
@@ -22,7 +22,7 @@ void sq_quote_buf(struct strbuf *dst, const char *src)
 	char *to_free = NULL;
 
 	if (dst->buf == src)
-		to_free = strbuf_detach(dst);
+		to_free = strbuf_detach(dst, NULL);
 
 	strbuf_addch(dst, '\'');
 	while (*src) {
diff --git a/sha1_file.c b/sha1_file.c
index f1377fb..83a06a7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2340,8 +2340,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
 		strbuf_init(&nbuf, 0);
 		if (convert_to_git(path, buf, size, &nbuf)) {
 			munmap(buf, size);
-			size = nbuf.len;
-			buf = nbuf.buf;
+			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
 	}
diff --git a/strbuf.c b/strbuf.c
index d5e92ee..450110d 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1,27 +1,30 @@
 #include "cache.h"
 
+/* used as the default ->buf value, so that people can always assume buf is
+   non NULL and ->buf[0] is '\0' */
+char strbuf_slopbuf[32];
+
 void strbuf_init(struct strbuf *sb, size_t hint)
 {
-	memset(sb, 0, sizeof(*sb));
+	sb->alloc = sb->len = 0;
+	sb->buf = strbuf_slopbuf;
 	if (hint)
 		strbuf_grow(sb, hint);
 }
 
 void strbuf_release(struct strbuf *sb)
 {
-	free(sb->buf);
-	memset(sb, 0, sizeof(*sb));
-}
-
-void strbuf_reset(struct strbuf *sb)
-{
-	if (sb->len)
-		strbuf_setlen(sb, 0);
+	if (sb->alloc) {
+		free(sb->buf);
+		strbuf_init(sb, 0);
+	}
 }
 
-char *strbuf_detach(struct strbuf *sb)
+char *strbuf_detach(struct strbuf *sb, size_t *sz)
 {
-	char *res = sb->buf;
+	char *res = sb->alloc ? sb->buf : NULL;
+	if (sz)
+		*sz = sb->len;
 	strbuf_init(sb, 0);
 	return res;
 }
@@ -40,6 +43,8 @@ void strbuf_grow(struct strbuf *sb, size_t extra)
 {
 	if (sb->len + extra + 1 <= sb->len)
 		die("you want to use way too much memory");
+	if (!sb->alloc)
+		sb->buf = NULL;
 	ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
 }
 
diff --git a/strbuf.h b/strbuf.h
index fd68389..a92222b 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -10,8 +10,7 @@
  * 1. the ->buf member is always malloc-ed, hence strbuf's can be used to
  *    build complex strings/buffers whose final size isn't easily known.
  *
- *    It is legal to copy the ->buf pointer away. Though if you want to reuse
- *    the strbuf after that, setting ->buf to NULL isn't legal.
+ *    It is NOT legal to copy the ->buf pointer away.
  *    `strbuf_detach' is the operation that detachs a buffer from its shell
  *    while keeping the shell valid wrt its invariants.
  *
@@ -41,19 +40,19 @@
 
 #include <assert.h>
 
+extern char strbuf_slopbuf[];
 struct strbuf {
 	size_t alloc;
 	size_t len;
 	char *buf;
 };
 
-#define STRBUF_INIT  { 0, 0, NULL }
+#define STRBUF_INIT  { 0, 0, strbuf_slopbuf }
 
 /*----- strbuf life cycle -----*/
 extern void strbuf_init(struct strbuf *, size_t);
 extern void strbuf_release(struct strbuf *);
-extern void strbuf_reset(struct strbuf *);
-extern char *strbuf_detach(struct strbuf *);
+extern char *strbuf_detach(struct strbuf *, size_t *);
 extern void strbuf_attach(struct strbuf *, void *, size_t, size_t);
 static inline void strbuf_swap(struct strbuf *a, struct strbuf *b) {
 	struct strbuf tmp = *a;
@@ -75,6 +74,7 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len) {
 	sb->len = len;
 	sb->buf[len] = '\0';
 }
+#define strbuf_reset(sb)  strbuf_setlen(sb, 0)
 
 /*----- content related -----*/
 extern void strbuf_rtrim(struct strbuf *);
-- 
1.5.3.2.1100.g015a-dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Use of strbuf.buf when strbuf.len == 0
  2007-09-27  6:21 Use of strbuf.buf when strbuf.len == 0 Junio C Hamano
  2007-09-27 10:13 ` Pierre Habouzit
@ 2007-09-27 11:22 ` Pierre Habouzit
  2007-09-27 11:33   ` [PROPER PATCH 1/1] Make read_patch_file work on a strbuf Pierre Habouzit
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Pierre Habouzit @ 2007-09-27 11:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

On Thu, Sep 27, 2007 at 06:21:24AM +0000, Junio C Hamano wrote:
> It would be appreciated if somebody with a fresh pair of eyes
> can go over the strbuf series one more time to make sure that we
> do not try to blindly use buf.buf, assuming buf.buf[0] is NUL if
> (buf.len == 0).

  Like said in the 2/2 patch, I think it's better if people could be
able to always assume that and be done with it, else you have to know
this internal duality of the empty strbuf and it sucks.

  Instead, what is important, is that people that initialized a strbuf,
then want to go back in the char* world gets a NULL if nothing was
allocated. It is a semantics that is used in a few places (it's arguable
that it's a right thing to assume though). For those, making
strbuf_detach use mandatory, and dealing with the special ->alloc == 0
case is the easiest way.

  And as I don't trust my eyes to be fresh, I've used the aid of the
compiler to bust any place where we were using .buf members directly,
possibly doing something stupid.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/1] Make read_patch_file work on a strbuf.
  2007-09-27 11:22 ` Pierre Habouzit
  2007-09-27 11:33   ` [PROPER PATCH 1/1] Make read_patch_file work on a strbuf Pierre Habouzit
@ 2007-09-27 11:33   ` Pierre Habouzit
  2007-09-27 11:37   ` Use of strbuf.buf when strbuf.len == 0 Pierre Habouzit
  2 siblings, 0 replies; 10+ messages in thread
From: Pierre Habouzit @ 2007-09-27 11:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

So that we don't need to use strbuf_detach.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
---
 builtin-apply.c |   18 ++++++++----------
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/builtin-apply.c b/builtin-apply.c
index 833b142..b0c6e60 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -178,7 +178,7 @@ static void say_patch_name(FILE *output, const char *pre, struct patch *patch, c
 #define CHUNKSIZE (8192)
 #define SLOP (16)
 
-static void *read_patch_file(int fd, size_t *sizep)
+static void read_patch_file(struct strbuf *sb, int fd)
 {
 	struct strbuf buf;
 
@@ -193,7 +193,6 @@ static void *read_patch_file(int fd, size_t *sizep)
 	 */
 	strbuf_grow(&buf, SLOP);
 	memset(buf.buf + buf.len, 0, SLOP);
-	return strbuf_detach(&buf, sizep);
 }
 
 static unsigned long linelen(const char *buffer, unsigned long size)
@@ -2648,22 +2647,22 @@ static void prefix_patches(struct patch *p)
 
 static int apply_patch(int fd, const char *filename, int inaccurate_eof)
 {
-	unsigned long offset, size;
-	char *buffer = read_patch_file(fd, &size);
+	size_t offset;
+	struct strbuf buf;
 	struct patch *list = NULL, **listp = &list;
 	int skipped_patch = 0;
 
+	strbuf_init(&buf, 0);
 	patch_input_file = filename;
-	if (!buffer)
-		return -1;
+	read_patch_file(&buf, fd);
 	offset = 0;
-	while (size > 0) {
+	while (offset < buf.len) {
 		struct patch *patch;
 		int nr;
 
 		patch = xcalloc(1, sizeof(*patch));
 		patch->inaccurate_eof = inaccurate_eof;
-		nr = parse_chunk(buffer + offset, size, patch);
+		nr = parse_chunk(buf.buf + offset, buf.len, patch);
 		if (nr < 0)
 			break;
 		if (apply_in_reverse)
@@ -2681,7 +2680,6 @@ static int apply_patch(int fd, const char *filename, int inaccurate_eof)
 			skipped_patch++;
 		}
 		offset += nr;
-		size -= nr;
 	}
 
 	if (whitespace_error && (new_whitespace == error_on_whitespace))
@@ -2716,7 +2714,7 @@ static int apply_patch(int fd, const char *filename, int inaccurate_eof)
 	if (summary)
 		summary_patch_list(list);
 
-	free(buffer);
+	strbuf_release(&buf);
 	return 0;
 }
 
-- 
1.5.3.2.1101.g89e60-dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PROPER PATCH 1/1] Make read_patch_file work on a strbuf.
  2007-09-27 11:22 ` Pierre Habouzit
@ 2007-09-27 11:33   ` Pierre Habouzit
  2007-09-27 11:33   ` [PATCH " Pierre Habouzit
  2007-09-27 11:37   ` Use of strbuf.buf when strbuf.len == 0 Pierre Habouzit
  2 siblings, 0 replies; 10+ messages in thread
From: Pierre Habouzit @ 2007-09-27 11:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

So that we don't need to use strbuf_detach.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
---

Sorry, in the hurry I sent a wrong patch before, this one works :)

 builtin-apply.c |   27 +++++++++++----------------
 1 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/builtin-apply.c b/builtin-apply.c
index 833b142..1f0a672 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -178,12 +178,9 @@ static void say_patch_name(FILE *output, const char *pre, struct patch *patch, c
 #define CHUNKSIZE (8192)
 #define SLOP (16)
 
-static void *read_patch_file(int fd, size_t *sizep)
+static void read_patch_file(struct strbuf *sb, int fd)
 {
-	struct strbuf buf;
-
-	strbuf_init(&buf, 0);
-	if (strbuf_read(&buf, fd, 0) < 0)
+	if (strbuf_read(sb, fd, 0) < 0)
 		die("git-apply: read returned %s", strerror(errno));
 
 	/*
@@ -191,9 +188,8 @@ static void *read_patch_file(int fd, size_t *sizep)
 	 * so that we can do speculative "memcmp" etc, and
 	 * see to it that it is NUL-filled.
 	 */
-	strbuf_grow(&buf, SLOP);
-	memset(buf.buf + buf.len, 0, SLOP);
-	return strbuf_detach(&buf, sizep);
+	strbuf_grow(sb, SLOP);
+	memset(sb->buf + sb->len, 0, SLOP);
 }
 
 static unsigned long linelen(const char *buffer, unsigned long size)
@@ -2648,22 +2644,22 @@ static void prefix_patches(struct patch *p)
 
 static int apply_patch(int fd, const char *filename, int inaccurate_eof)
 {
-	unsigned long offset, size;
-	char *buffer = read_patch_file(fd, &size);
+	size_t offset;
+	struct strbuf buf;
 	struct patch *list = NULL, **listp = &list;
 	int skipped_patch = 0;
 
+	strbuf_init(&buf, 0);
 	patch_input_file = filename;
-	if (!buffer)
-		return -1;
+	read_patch_file(&buf, fd);
 	offset = 0;
-	while (size > 0) {
+	while (offset < buf.len) {
 		struct patch *patch;
 		int nr;
 
 		patch = xcalloc(1, sizeof(*patch));
 		patch->inaccurate_eof = inaccurate_eof;
-		nr = parse_chunk(buffer + offset, size, patch);
+		nr = parse_chunk(buf.buf + offset, buf.len, patch);
 		if (nr < 0)
 			break;
 		if (apply_in_reverse)
@@ -2681,7 +2677,6 @@ static int apply_patch(int fd, const char *filename, int inaccurate_eof)
 			skipped_patch++;
 		}
 		offset += nr;
-		size -= nr;
 	}
 
 	if (whitespace_error && (new_whitespace == error_on_whitespace))
@@ -2716,7 +2711,7 @@ static int apply_patch(int fd, const char *filename, int inaccurate_eof)
 	if (summary)
 		summary_patch_list(list);
 
-	free(buffer);
+	strbuf_release(&buf);
 	return 0;
 }
 
-- 
1.5.3.2.1102.gefa87-dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Use of strbuf.buf when strbuf.len == 0
  2007-09-27 11:22 ` Pierre Habouzit
  2007-09-27 11:33   ` [PROPER PATCH 1/1] Make read_patch_file work on a strbuf Pierre Habouzit
  2007-09-27 11:33   ` [PATCH " Pierre Habouzit
@ 2007-09-27 11:37   ` Pierre Habouzit
  2 siblings, 0 replies; 10+ messages in thread
From: Pierre Habouzit @ 2007-09-27 11:37 UTC (permalink / raw)
  To: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 646 bytes --]

On Thu, Sep 27, 2007 at 11:22:04AM +0000, Pierre Habouzit wrote:
> (it's arguable > that it's a right thing to assume though)

  I find this ugly, so I've even checked if we did assume that, which is
easy now that such places all use strbuf_detach. One place seemed to,
but wasn't, so I patched it so that it's not the case anymore.

  All other places that use strbuf_detach don't use the fact that it can
return NULL to detect error cases, so we are safe.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Use of strbuf.buf when strbuf.len == 0
  2007-09-27 10:13 ` Pierre Habouzit
  2007-09-27 10:51   ` [PATCH 1/2] double free in builtin-update-index.c Pierre Habouzit
  2007-09-27 10:58   ` [PATCH 2/2] strbuf change: be sure ->buf is never ever NULL Pierre Habouzit
@ 2007-09-29  0:51   ` Linus Torvalds
  2007-09-29  7:48     ` Pierre Habouzit
  2 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2007-09-29  0:51 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: Junio C Hamano, git



On Thu, 27 Sep 2007, Pierre Habouzit wrote:
> 
>   I can see a way, that would need special proof-reading of the strbuf
> module, but should not harm its users, that would be to change
> STRBUF_INIT to work this way:
> 
>   { .buf = "", .len = 0, .alloc = 0 }

I'd like to pipe up a bit here..

I think the above is a good fix for the current problem of wanting to 
always be able to use "sb->buf", but I thinkit actually has the potential 
to fix another issue entirely.

Namely strbuf's that are initialized from various static strings and/or 
strings not directly allocated with malloc().

That's not necessarily something really unusual. Wanting to initialize a 
string with a fixed constant value is a common problem.

And wouldn't it be nice if you could actually do that, with

	{ .buf = "static initializer", .len = 18, .alloc = 0 }

and have all the strbuf routines that modify the initializer (including 
making it shorter!) notice that the allocation is too short, and create a 
new allocation?

Hmm?

			Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Use of strbuf.buf when strbuf.len == 0
  2007-09-29  0:51   ` Use of strbuf.buf when strbuf.len == 0 Linus Torvalds
@ 2007-09-29  7:48     ` Pierre Habouzit
  0 siblings, 0 replies; 10+ messages in thread
From: Pierre Habouzit @ 2007-09-29  7:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 3118 bytes --]

On Sat, Sep 29, 2007 at 12:51:36AM +0000, Linus Torvalds wrote:
> 
> 
> On Thu, 27 Sep 2007, Pierre Habouzit wrote:
> > 
> >   I can see a way, that would need special proof-reading of the strbuf
> > module, but should not harm its users, that would be to change
> > STRBUF_INIT to work this way:
> > 
> >   { .buf = "", .len = 0, .alloc = 0 }
> 
> I'd like to pipe up a bit here..
> 
> I think the above is a good fix for the current problem of wanting to 
> always be able to use "sb->buf", but I thinkit actually has the potential 
> to fix another issue entirely.
> 
> Namely strbuf's that are initialized from various static strings and/or 
> strings not directly allocated with malloc().
> 
> That's not necessarily something really unusual. Wanting to initialize a 
> string with a fixed constant value is a common problem.
> 
> And wouldn't it be nice if you could actually do that, with
> 
> 	{ .buf = "static initializer", .len = 18, .alloc = 0 }
> 
> and have all the strbuf routines that modify the initializer (including 
> making it shorter!) notice that the allocation is too short, and create a 
> new allocation?

  We could probably do that. The places to change to make this work are
seldom:
  * strbuf_grow to emulate a realloc (and copy the buffer into the new
    malloc()ed one),
  * strbuf_release assumes that ->alloc == 0 means _init isn't
    necessary, it would be now,
  * strbuf_setlen should not have the assert anymore (though I'm not
    sure this assert still makes sense with the new initializer).
and that's it.

  But we cannot initialize a strbuf with an immediate string because all
the strbuf APIs suppose that the strbuf buffer are writeable (and IMHO
it's pointless to use a strbuf for reading purposes). Other point, I've
made many readings of the code searching for specific patterns of code,
to replace with strbuf's, and I've never seen places (I do not say those
don't exists though) that would directly benefit from that.

> Hmm?

  So I'd say I'll keep the idea in mind, because it's tasteful and could
help, though I'd prefer Junio to review that patch, and then later add
this new semantics if the need arises.

  The sole thing that may be worth investigating would look like:

      char internal_buf[PATH_MAX]; /* should be damn long enough */
      struct strbuf buf;

      strbuf_init_static(&buf, internal_buf, sizeof(internal_buf);

      /* hack with the strbuf API */

      strbuf_release(&buf); /* do release memory, in case we went over
                               PATH_MAX octets */

because it could save some allocations _and_ be safe at the same time.
But I don't really like it, when allocation is critical in git, it's
rarely in functions where there is an obvious size limit for the
problem, and avoiding allocations can be done using static strbufs
(fast-import.c does that in many places).


-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-09-29  7:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-27  6:21 Use of strbuf.buf when strbuf.len == 0 Junio C Hamano
2007-09-27 10:13 ` Pierre Habouzit
2007-09-27 10:51   ` [PATCH 1/2] double free in builtin-update-index.c Pierre Habouzit
2007-09-27 10:58   ` [PATCH 2/2] strbuf change: be sure ->buf is never ever NULL Pierre Habouzit
2007-09-29  0:51   ` Use of strbuf.buf when strbuf.len == 0 Linus Torvalds
2007-09-29  7:48     ` Pierre Habouzit
2007-09-27 11:22 ` Pierre Habouzit
2007-09-27 11:33   ` [PROPER PATCH 1/1] Make read_patch_file work on a strbuf Pierre Habouzit
2007-09-27 11:33   ` [PATCH " Pierre Habouzit
2007-09-27 11:37   ` Use of strbuf.buf when strbuf.len == 0 Pierre Habouzit

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).