git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Cc: git@vger.kernel.org, simon.rabourg@ensimag.grenoble-inp.fr,
	francois.beutin@ensimag.grenoble-inp.fr,
	antoine.queru@ensimag.grenoble-inp.fr,
	matthieu.moy@grenoble-inp.fr, mhagger@alum.mit.edu
Subject: Re: [PATCH 2/2] strbuf: allow to use preallocated memory
Date: Mon, 30 May 2016 14:13:52 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.20.1605301326530.4449@virtualbox> (raw)
In-Reply-To: <20160530103642.7213-3-william.duclot@ensimag.grenoble-inp.fr>

Hi William,

On Mon, 30 May 2016, William Duclot wrote:

> It is unfortunate that it is currently impossible to use a strbuf
> without doing a memory allocation. So code like
> 
> void f()
> {
>     char path[PATH_MAX];
>     ...
> }
> 
> typically gets turned into either
> 
> void f()
> {
>     struct strbuf path;
>     strbuf_add(&path, ...); <-- does a malloc
>     ...
>     strbuf_release(&path);  <-- does a free
> }
> 
> which costs extra memory allocations, or
> 
> void f()
> {
>     static struct strbuf path;
>     strbuf_add(&path, ...);
>     ...
>     strbuf_setlen(&path, 0);
> }
> 
> which, by using a static variable, avoids most of the malloc/free
> overhead, but makes the function unsafe to use recursively or from
> multiple threads. Those limitations prevent strbuf to be used in
> performance-critical operations.

This description is nice and verbose, but maybe something like this would
introduce the subject in a quicker manner?

	When working e.g. with file paths or with dates, strbuf's
	malloc()/free() dance of strbufs can be easily avoided: as
	a sensible initial buffer size is already known, it can be
	allocated on the heap.

The rest of the commit message flows nicely.

> diff --git a/strbuf.c b/strbuf.c
> index 1ba600b..527b986 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -1,6 +1,14 @@
>  #include "cache.h"
>  #include "refs.h"
>  #include "utf8.h"
> +#include <sys/param.h>

Why?

> +/**
> + * Flags
> + * --------------
> + */
> +#define STRBUF_OWNS_MEMORY 1
> +#define STRBUF_FIXED_MEMORY (1 << 1)

>From reading the commit message, I expected STRBUF_OWNS_MEMORY.
STRBUF_FIXED_MEMORY still needs to be explained.

> @@ -20,16 +28,37 @@ char strbuf_slopbuf[1];
>  
>  void strbuf_init(struct strbuf *sb, size_t hint)
>  {
> +	sb->flags = 0;
>  	sb->alloc = sb->len = 0;
>  	sb->buf = strbuf_slopbuf;
>  	if (hint)
>  		strbuf_grow(sb, hint);
>  }
>  
> +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf,
> +			      size_t path_buf_len, size_t alloc_len)
> +{
> +	if (!path_buf)
> +		die("you try to use a NULL buffer to initialize a strbuf");
> +
> +	strbuf_init(sb, 0);
> +	strbuf_attach(sb, path_buf, path_buf_len, alloc_len);
> +	sb->flags &= ~STRBUF_OWNS_MEMORY;
> +	sb->flags &= ~STRBUF_FIXED_MEMORY;

Shorter: sb->flags &= ~(STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY);

> +}
> +
> +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf,
> +		       size_t path_buf_len, size_t alloc_len)
> +{
> +	strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len);
> +	sb->flags |= STRBUF_FIXED_MEMORY;
> +}

Rather than letting strbuf_wrap_preallocated() set sb->flags &=
~FIXED_MEMORY only to revert that decision right away, a static function
could be called by both strbuf_wrap_preallocated() and
strbuf_wrap_fixed().

>  void strbuf_release(struct strbuf *sb)
>  {
>  	if (sb->alloc) {
> -		free(sb->buf);
> +		if (sb->flags & STRBUF_OWNS_MEMORY)
> +			free(sb->buf);
>  		strbuf_init(sb, 0);
>  	}

Should we not reset the flags here, too?

> @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz)
>  {
>  	char *res;
>  	strbuf_grow(sb, 0);
> -	res = sb->buf;
> +	if (sb->flags & STRBUF_OWNS_MEMORY)
> +		res = sb->buf;
> +	else
> +		res = xmemdupz(sb->buf, sb->alloc - 1);

This looks like a usage to be avoided: if we plan to detach the buffer,
anyway, there is no good reason to allocate it on the heap first. I would
at least issue a warning here.

> @@ -51,6 +84,8 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc)
>  	sb->buf   = buf;
>  	sb->len   = len;
>  	sb->alloc = alloc;
> +	sb->flags |= STRBUF_OWNS_MEMORY;
> +	sb->flags &= ~STRBUF_FIXED_MEMORY;
>  	strbuf_grow(sb, 0);
>  	sb->buf[sb->len] = '\0';
>  }
> @@ -61,9 +96,32 @@ void strbuf_grow(struct strbuf *sb, size_t extra)
>  	if (unsigned_add_overflows(extra, 1) ||
>  	    unsigned_add_overflows(sb->len, extra + 1))
>  		die("you want to use way too much memory");
> -	if (new_buf)
> -		sb->buf = NULL;
> -	ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
> +	if ((sb->flags & STRBUF_FIXED_MEMORY) && sb->len + extra + 1 > sb->alloc)
> +		die("you try to make a string overflow the buffer of a fixed strbuf");

We try to avoid running over 80 columns/row. This message could be
more to the point: cannot grow fixed string

> +	/*
> +	 * ALLOC_GROW may do a realloc() if needed, so we must not use it on
> +	 * a buffer the strbuf doesn't own
> +	 */
> +	if (sb->flags & STRBUF_OWNS_MEMORY) {
> +		if (new_buf)
> +			sb->buf = NULL;
> +		ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
> +	} else {
> +		/*
> +		 * The strbuf doesn't own the buffer: to avoid to realloc it,
> +		 * the strbuf needs to use a new buffer without freeing the old
> +		 */
> +		if (sb->len + extra + 1 > sb->alloc) {
> +			size_t new_alloc = MAX(sb->len + extra + 1, alloc_nr(sb->alloc));
> +			char *buf = xmalloc(new_alloc);
> +			memcpy(buf, sb->buf, sb->alloc);
> +			sb->buf = buf;
> +			sb->alloc = new_alloc;
> +			sb->flags |= STRBUF_OWNS_MEMORY;
> +		}
> +	}
> +
>  	if (new_buf)
>  		sb->buf[0] = '\0';
>  }
> diff --git a/strbuf.h b/strbuf.h
> index 7987405..634759c 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -11,11 +11,16 @@
>   * A strbuf is NUL terminated for convenience, but no function in the
>   * strbuf API actually relies on the string being free of NULs.
>   *
> + * You can avoid the malloc/free overhead of `strbuf_init()`, `strbuf_add()` and
> + * `strbuf_release()` by wrapping pre-allocated memory (stack-allocated for
> + * example) using `strbuf_wrap_preallocated()` or `strbuf_wrap_fixed()`.
> + *
>   * strbufs have some invariants that are very important to keep in mind:
>   *
>   *  - The `buf` member is never NULL, so it can be used in any usual C
>   *    string operations safely. strbuf's _have_ to be initialized either by
> - *    `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.
> + *    `strbuf_init()`, `= STRBUF_INIT`, `strbuf_wrap_preallocated()` or
> + *    `strbuf_wrap_fixed()` before the invariants, though.
>   *
>   *    Do *not* assume anything on what `buf` really is (e.g. if it is
>   *    allocated memory or not), use `strbuf_detach()` to unwrap a memory
> @@ -62,13 +67,14 @@
>   * access to the string itself.
>   */
>  struct strbuf {
> +	unsigned int flags;
>  	size_t alloc;
>  	size_t len;
>  	char *buf;
>  };
>  
>  extern char strbuf_slopbuf[];
> -#define STRBUF_INIT  { 0, 0, strbuf_slopbuf }
> +#define STRBUF_INIT  { 0, 0, 0, strbuf_slopbuf }

If I am not mistaken, to preserve the existing behavior the initial flags
should be 1 (own memory).

BTW this demonstrates that it may not be a good idea to declare the
"flags" field globally but then make the actual flags private.

Also: similar use cases in Git used :1 flags (see e.g. the "configured"
field in credential.h).

Ciao,
Johannes

  reply	other threads:[~2016-05-30 12:14 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-30 10:36 [PATCH 0/2] strbuf: improve API William Duclot
2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot
2016-05-30 11:26   ` Johannes Schindelin
2016-05-30 13:42     ` Simon Rabourg
2016-05-30 11:56   ` Matthieu Moy
2016-05-31  2:04   ` Michael Haggerty
2016-05-31  9:48     ` Simon Rabourg
2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot
2016-05-30 12:13   ` Johannes Schindelin [this message]
2016-05-30 13:20     ` William Duclot
2016-05-31  6:21       ` Johannes Schindelin
2016-05-31  3:05     ` Michael Haggerty
2016-05-31  6:41       ` Johannes Schindelin
2016-05-31  8:25         ` Michael Haggerty
2016-05-30 12:52   ` Matthieu Moy
2016-05-30 14:15     ` William Duclot
2016-05-30 14:34       ` Matthieu Moy
2016-05-30 15:16         ` William Duclot
2016-05-31  4:05     ` Michael Haggerty
2016-05-31 15:59       ` William Duclot
2016-06-03 14:04       ` William Duclot
2016-05-30 21:56   ` Mike Hommey
2016-05-30 22:46     ` William Duclot
2016-05-30 22:50       ` Mike Hommey
2016-05-31  6:34   ` Junio C Hamano
2016-05-31 15:45     ` William
2016-05-31 15:54       ` Matthieu Moy
2016-05-31 16:08         ` William Duclot
2016-05-30 11:32 ` [PATCH 0/2] strbuf: improve API Remi Galan Alfonso
2016-06-01  7:42   ` Jeff King
2016-06-01 19:50     ` David Turner
2016-06-01 20:09       ` Jeff King
2016-06-01 20:22         ` David Turner
2016-06-01 21:07     ` Jeff King
2016-06-02 11:11       ` Michael Haggerty
2016-06-02 12:58         ` Matthieu Moy
2016-06-02 14:22           ` William Duclot
2016-06-24 17:20         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1605301326530.4449@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=antoine.queru@ensimag.grenoble-inp.fr \
    --cc=francois.beutin@ensimag.grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=matthieu.moy@grenoble-inp.fr \
    --cc=mhagger@alum.mit.edu \
    --cc=simon.rabourg@ensimag.grenoble-inp.fr \
    --cc=william.duclot@ensimag.grenoble-inp.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).