git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Robear Selwans <rwagih.rw@gmail.com>
Cc: git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [GSoC][RFC][PATCH 1/2] STRBUF_INIT_CONST: a new way to initialize strbuf
Date: Tue, 18 Feb 2020 01:21:24 -0500	[thread overview]
Message-ID: <20200218062124.GF1641086@coredump.intra.peff.net> (raw)
In-Reply-To: <20200218041805.10939-2-robear.selwans@outlook.com>

On Tue, Feb 18, 2020 at 04:18:04AM +0000, Robear Selwans wrote:

> A new function `STRBUF_INIT_CONST(const_str)` was added to allow for a
> quick initialization of strbuf.
> 
> Details:
> Using `STRBUF_INIT_CONST(str)` creates a new struct of type `strbuf` and
> initializes its `buf`, `len` and `alloc` as `str`, `strlen(str)` and
> `0`, respectively.
> 
> Use Case:
> This is meant to be used to initialize strbufs with constant values and
> thus, only allocating memory when needed.
> 
> Usage Example:
> ```
> strbuf env_var = STRBUF_INIT_CONST("dev");
> ```

This seems a bit dangerous to me, as we're initializing a non-const
pointer with a string literal. In fact, I'm a little surprised that the
compiler doesn't complain, but I think it's mostly due to historical
C-isms (the type of string literals is array-of-char). Using gcc's
-Wwrite-strings does complain, but there are several other cases already
in Git (looking at a few, I think there are some opportunities for
cleanup).

Your second patch catches cases where the strbuf functions want to write
to the buffer. But we've always been pretty open about the fact that
strbuf.buf is a writeable C-style string. So something like this:

  struct strbuf x = STRBUF_INIT_CONST("foo");
  size_t i;

  for (i = 0; i < x.len; i++)
	x.buf[i] = toupper(x.buf[i]);

would generate no compile-time warnings, but would invoke undefined
behavior (on my system it segfaults when run, but it could have even
more confusing outcomes). Even though this is called out specifically in
the strbuf docs:

   However, it is totally safe to modify anything in the string pointed
   by the `buf` member, between the indices `0` and `len-1` (inclusive).

Of course it would be easy to fix that by adding a strbuf_make_var()
call. But my concern is cases where the const-ness and the use of the
strbuf are far apart. The point of a strbuf is that you can just use it
without worrying, but now it's carrying this extra hidden state.

If we want to pursue this direction, I think we'd do better to give each
strbuf a matching array. Something like:

  #define STRBUF_INIT_FROM(buf) { .alloc = 0, .buf = buf, .len = ARRAY_SIZE(buf)-1 }
  ...
  char foo_buf[] = "this is the constant value";
  struct strbuf foo = STRBUF_INIT_FROM(foo_buf);

That gives you a true writeable buffer with the const data in it. _And_
it opens up the option of strbufs using stack buffers with an empty
initial value for efficiency (i.e., avoiding the heap at all for short
common cases, but being able to grow when needed). One trouble is that
you can't do it all in a single variable. You'd need something like:

  #define DECLARE_STACK_STRBUF(name, contents) \
	char name##_buf[] = (contents);
	struct strbuf name = STRBUF_INIT_FROM(name##_buf)
  ...
  DECLARE_STACK_STRBUF(foo, "this is the constant value");

But that gets weirdly un-C-like (your macro expands to multiple
statements, which is usually a macro pitfall; but we can't use the usual
"do { } while(0)" trick here, because the variables would go out of
scope at the end of the fake block.

So I think there are interesting directions here, but there's a lot of
stuff to figure out.

I notice you put GSoC in your subject line. If you're looking at this as
a microproject, IMHO this is _way_ more complicated and subtle than a
microproject should be. The goal there is to give something so easy that
you get to focus on getting your patches in and interacting with the
community. The scope I'd expect is more along the lines of compiling
with -Wwrite-strings and cleaning up some of the locations that
complain.

-Peff

  reply	other threads:[~2020-02-18  6:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18  4:18 [GSoC][RFC][PATCH 0/2] STRBUF_INIT_CONST Cover Robear Selwans
2020-02-18  4:18 ` [GSoC][RFC][PATCH 1/2] STRBUF_INIT_CONST: a new way to initialize strbuf Robear Selwans
2020-02-18  6:21   ` Jeff King [this message]
2020-02-18 14:19     ` Robear Selwans
2020-02-18 20:33       ` Jeff King
2020-02-19  8:13   ` Johannes Sixt
2020-02-20 18:49     ` René Scharfe
2020-02-21  5:21       ` Robear Selwans
2020-02-27  6:50       ` Derrick Stolee
2020-02-27 15:55         ` René Scharfe
2020-02-18  4:18 ` [GSoC][RFC][PATCH 2/2] STRBUF_INIT_CONST: Adapting strbuf_* functions Robear Selwans

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200218062124.GF1641086@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=rwagih.rw@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).