git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: Use of strbuf.buf when strbuf.len == 0
Date: Wed, 26 Sep 2007 23:21:24 -0700	[thread overview]
Message-ID: <7vir5wy6fv.fsf@gitster.siamese.dyndns.org> (raw)

I am starting to hate some parts of the strbuf API.

Here is an example.  Can you spot what goes wrong?

static int handle_file(const char *path,
	 unsigned char *sha1, const char *output)
{
	SHA_CTX ctx;
	char buf[1024];
	int hunk = 0, hunk_no = 0;
	struct strbuf one, two;
	...
	if (sha1)
		SHA1_Init(&ctx);

	strbuf_init(&one, 0);
	strbuf_init(&two,  0);
	while (fgets(buf, sizeof(buf), f)) {
		if (!prefixcmp(buf, "<<<<<<< "))
			hunk = 1;
		else if (!prefixcmp(buf, "======="))
			hunk = 2;
		else if (!prefixcmp(buf, ">>>>>>> ")) {
			int cmp = strbuf_cmp(&one, &two);

			hunk_no++;
			hunk = 0;
			if (cmp > 0) {
				strbuf_swap(&one, &two);
			}
			if (out) {
				fputs("<<<<<<<\n", out);
				fwrite(one.buf, one.len, 1, out);
				fputs("=======\n", out);
				fwrite(two.buf, two.len, 1, out);
				fputs(">>>>>>>\n", out);
			}
			if (sha1) {
				SHA1_Update(&ctx, one.buf, one.len + 1);
				SHA1_Update(&ctx, two.buf, two.len + 1);
			}
			strbuf_reset(&one);
			strbuf_reset(&two);
		} else if (hunk == 1)
			strbuf_addstr(&one, buf);
		else if (hunk == 2)
			strbuf_addstr(&two, buf);
		else if (out)
			fputs(buf, out);
	}
	strbuf_release(&one);
	strbuf_release(&two);
	...
}

When one or two are empty and the caller asked for checksumming,
the code still relies on one.buf being an allocated memory with
an extra NUL termination and tries to feed the NULL pointer to
SHA1_Update() with length of 1!

An obvious workaround is to say "strbuf_init(&one, !!sha1)" to
force the allocation.  However, because the second parameter to
strbuf_init() is defined to be merely a hint, we should not rely
on strbuf_init() with non-zero hint to have allocation from the
beginning.  It is assuming too much.

A more defensive way would be for the user to do something like:

	SHA1_Update(&ctx, one.buf ? one.buf : "", one.len + 1);
	SHA1_Update(&ctx, two.buf ? two.buf : "", two.len + 1);

which I am leaning towards, but this looks ugly.

I was already bitten by another bug in strbuf_setlen() that
shared the same roots; an empty strbuf can have two internal
representations ("alloc == 0 && buf == NULL" vs "alloc != 0 &&
buf != NULL") and they behave differently.

For example, this happens to be Ok but the validity of it is
subtle.  If a or b is empty, memcmp may get a NULL pointer but
we ask only 0 byte comparison.

        int strbuf_cmp(struct strbuf *a, struct strbuf *b)
        {
                int cmp;
                if (a->len < b->len) {
                        cmp = memcmp(a->buf, b->buf, a->len);
                        return cmp ? cmp : -1;
                } else {
                        cmp = memcmp(a->buf, b->buf, b->len);
                        return cmp ? cmp : a->len != b->len;
                }
        }

It might be an easier and safer fix to define that strbuf_init()
to always have allocation.  Use of a strbuf in the code _and_
not adding any contents to the buffer should be an exception and
avoiding malloc()/free() for that special case feels optimizing
for the wrong case.

However, there are strbuf instances that are not initialized
(i.e. in BSS or initialized by declaring with STRBUF_INIT), so
we still need to handle (.len == 0 && .alloc == 0) case
specially anyway.

It would be appreciated if somebody with a fresh pair of eyes
can go over the strbuf series one more time to make sure that we
do not try to blindly use buf.buf, assuming buf.buf[0] is NUL if
(buf.len == 0).

             reply	other threads:[~2007-09-27  6:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-27  6:21 Junio C Hamano [this message]
2007-09-27 10:13 ` Use of strbuf.buf when strbuf.len == 0 Pierre Habouzit
2007-09-27 10:51   ` [PATCH 1/2] double free in builtin-update-index.c Pierre Habouzit
2007-09-27 10:58   ` [PATCH 2/2] strbuf change: be sure ->buf is never ever NULL Pierre Habouzit
2007-09-29  0:51   ` Use of strbuf.buf when strbuf.len == 0 Linus Torvalds
2007-09-29  7:48     ` Pierre Habouzit
2007-09-27 11:22 ` Pierre Habouzit
2007-09-27 11:33   ` [PROPER PATCH 1/1] Make read_patch_file work on a strbuf Pierre Habouzit
2007-09-27 11:33   ` [PATCH " Pierre Habouzit
2007-09-27 11:37   ` Use of strbuf.buf when strbuf.len == 0 Pierre Habouzit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vir5wy6fv.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).