Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Junio C Hamano <gitster@pobox.com>
To: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Cc: git@vger.kernel.org, antoine.queru@ensimag.grenoble-inp.fr,
	francois.beutin@ensimag.grenoble-inp.fr, mhagger@alum.mit.edu,
	Johannes.Schindelin@gmx.de, peff@peff.net, mh@glandium.org
Subject: Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory
Date: Mon, 06 Jun 2016 10:19:07 -0700	[thread overview]
Message-ID: <xmqqvb1mxmk4.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20160606151340.22424-4-william.duclot@ensimag.grenoble-inp.fr> (William Duclot's message of "Mon, 6 Jun 2016 17:13:40 +0200")

William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:

> +#define MAX_ALLOC(a, b) (((a)>(b))?(a):(b))

I do not see why this macro is called MAX_ALLOC(); is there anything
"alloc" specific to what this does?  You may happen to use it only
for "alloc" related things, but that is not a good reason for such a
name (if the implementation can only be used for "alloc" specific
purpose for some reason, then the name is appropriate).

> @@ -20,16 +23,47 @@ char strbuf_slopbuf[1];
>  
>  void strbuf_init(struct strbuf *sb, size_t hint)
>  {
> +	sb->owns_memory = 0;
> +	sb->fixed_memory = 0;
>  	sb->alloc = sb->len = 0;
>  	sb->buf = strbuf_slopbuf;
>  	if (hint)
>  		strbuf_grow(sb, hint);
>  }

I'll complain about _grow() again later, but if the implementation
is updated to address that complaint, slopbuf-using strbuf could
start it as a special case of "starts as fixed_memory".

> +static void strbuf_wrap_internal(struct strbuf *sb, char *buf,
> +				 size_t buf_len, size_t alloc_len)
> +{
> +	if (!buf)
> +		die("the buffer of a strbuf cannot be NULL");
> +
> +	strbuf_release(sb);
> +	sb->len = buf_len;
> +	sb->alloc = alloc_len;
> +	sb->buf = buf;
> +}
> +
> +void strbuf_wrap(struct strbuf *sb, char *buf,
> +		 size_t buf_len, size_t alloc_len)
> +{
> +	strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
> +	sb->owns_memory = 0;
> +	sb->fixed_memory = 0;
> +}
>
> +void strbuf_wrap_fixed(struct strbuf *sb, char *buf,
> +		       size_t buf_len, size_t alloc_len)
> +{
> +	strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
> +	sb->owns_memory = 0;
> +	sb->fixed_memory = 1;
> +}

After seeing 2/3, I would say that the pretty side deserves the use
of word "wrap" more.  What the above two does is an enhancement of
what strbuf API has called "attach".  You are introducing a few new
and different ways to attach a piece of memory to a strbuf.

> @@ -37,8 +71,13 @@ void strbuf_release(struct strbuf *sb)
>  char *strbuf_detach(struct strbuf *sb, size_t *sz)
>  {
>  	char *res;
> +
>  	strbuf_grow(sb, 0);
> -	res = sb->buf;
> +	if (sb->owns_memory)
> +		res = sb->buf;
> +	else
> +		res = xmemdupz(sb->buf, sb->len);
> +
>  	if (sz)
>  		*sz = sb->len;
>  	strbuf_init(sb, 0);

Note that the user of the API does not have to care if the memory is
held by the strbuf or if the strbuf is merely borrowing it from the
initial allocation made from elsewhere (e.g. via strbuf_wrap_*) when
finalizing its use of the strbuf API to build the contents and
utilize the resulting buffer.  Theoretically, if the caller knows it
let the piece memory _it_ owns (i.e. !owns_memory) and the piece of
memory it originally gave the strbuf hasn't been moved (i.e. your
fixed_memory, which I will complain again soon), the caller
shouldn't have to get a copy via xmemdupz(), which would both save
an allocation here and free() in the caller once it is done.

But that is not done, and as the API it is a good thing.  It frees
the caller from having to worry about _where_ the memory originally
came from.

>  void strbuf_grow(struct strbuf *sb, size_t extra)
>  {
> -	int new_buf = !sb->alloc;
>  	if (unsigned_add_overflows(extra, 1) ||
>  	    unsigned_add_overflows(sb->len, extra + 1))
>  		die("you want to use way too much memory");
> -	if (new_buf)
> -		sb->buf = NULL;
> -	ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
> -	if (new_buf)
> -		sb->buf[0] = '\0';
> +	if ((sb->fixed_memory) &&
> +	    sb->len + extra + 1 > sb->alloc)
> +		die("you try to overflow the buffer of a fixed strbuf");

Having praised how _detach() frees the user of the API from having
to worry about minute details of allocation, this is an eyesore.  I
really think this "fixed_memory" support MUST GO.  This "die()" here
makes the API useless _unless_ the caller always makes sure that any
operation it performs does not overflow the originally allocated
size, which makes it no better than using xsnprintf() and friends.

A better design would be to get rid of fixed_memory bit and replace
it with "owns_memory" bit.  Its meaning would be "The ->buf pointer
points at something that is not owned by this strbuf, and must not
be freed when resizing or releasing ->buf."

And when you detect that a strbuf with unfreeable memory needs to be
extended (the above condition you call "die()" on), you use the
"strbuf doesn't own the buffer" logic below.

The only difference between your version and an improved one that
gets rid of "fixed_memory" from the API user's point of view is that
they must pre-allocate absolute maximum amount of memory expected to
be consumed with your version in order not to crash, while a "start
with fixed but dynamically allocate from heap if needed" approach
lets the API user preallocate just sufficient amount of memory
expected to be consumed in a typical workload and go without calling
malloc()/free() most of the time.  Oh, and when a rare data comes in
that exceeds your initial estimate, "fixed_memory" version will just
die.

So, as I said in the previous round, I really do not see a point in
this fixed_memory business.  It does not make the implementation of
the API any easier as far as I can tell, either.

> +	/*
> +	 * ALLOC_GROW may do a realloc() if needed, so we must not use it on
> +	 * a buffer the strbuf doesn't own
> +	 */
> +	if (sb->owns_memory) {
> +		ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
> +	} else {
> +		/*
> +		 * The strbuf doesn't own the buffer: to avoid to realloc it,
> +		 * the strbuf needs to use a new buffer without freeing the old
> +		 */
> +		if (sb->len + extra + 1 > sb->alloc) {
> +			size_t new_alloc =
> +				MAX_ALLOC(sb->len + extra + 1,
> +					  alloc_nr(sb->alloc));

It is true that the you cannot simply realloc(->buf), but otherwise
the growth pattern of the buffer is the same between ->owns_memory
case and this case.  And I think the above computes it like so, but
do you really need a one-shot macro MAX_ALLOC that is named in a
misleading way?

I'd find it far easier to read if you wrote the core of ALLOC_GROW()
logic, like this:

	size_t new_nr = sb->len + extra + 1;
	if (new_nr > sb->alloc) {
		size_t new_alloc;
		if (alloc_nr(sb->alloc) < new_nr)
                	new_alloc = new_nr;
		else
			new_alloc = alloc_nr(sb->alloc);
		... copy and make it "owned" ...
	}

A better way would be to introduce a new macro ALLOC_GROW_COUNT() to
cache.h immediately before ALLOC_GROW() is defined and use it to
redo ALLOC_GROW(), perhaps like this:

#define ALLOC_GROW_COUNT(nr, alloc) \
        ((alloc_nr(alloc) < (nr)) ? (nr) : alloc_nr(alloc))

#define ALLOC_GROW(x, nr, alloc) \
	do { \
                if ((nr) > alloc) { \
                	alloc = ALLOC_GROW_COUNT(nr, alloc); \
			REALLOC_ARRAY(x, alloc); \
		} \
	} while (0)			

Then you can do

	if (sb->len + extra + 1 > sb->alloc) {
		size_t new_alloc;
                char *new_buf;

        	new_alloc = ALLOC_GROW_COUNT(sb->len + extra + 1, sb->alloc);
		new_buf = xmalloc(new_alloc);
		memcpy(...);
		...

next prev parent reply	other threads:[~2016-06-06 17:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 15:13 [PATCH V2 0/3] strbuf: improve API William Duclot
2016-06-06 15:13 ` [PATCH V2 1/3] strbuf: add tests William Duclot
2016-06-06 16:11   ` Matthieu Moy
2016-06-07  8:44   ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 2/3] pretty.c: rename strbuf_wrap() function William Duclot
2016-06-06 16:12   ` Matthieu Moy
2016-06-07  9:04   ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 3/3] strbuf: allow to use preallocated memory William Duclot
2016-06-06 16:17   ` Matthieu Moy
2016-06-06 17:19   ` Junio C Hamano [this message]
2016-06-06 20:39     ` William Duclot
2016-06-06 22:44       ` Junio C Hamano
2016-06-06 22:58         ` Jeff King
2016-06-06 23:24           ` Junio C Hamano
2016-06-06 23:25             ` Junio C Hamano
2016-06-06 23:30             ` Jeff King
2016-06-07  9:06             ` William Duclot
2016-06-07 18:10               ` Junio C Hamano
2016-06-08 16:20               ` Michael Haggerty
2016-06-08 18:07                 ` Junio C Hamano
2016-06-08 19:19                 ` Jeff King
2016-06-08 19:42                   ` [PATCH] send-pack: use buffered I/O to talk to pack-objects Jeff King
2016-06-09 12:10                     ` Matthieu Moy
2016-06-09 14:34                       ` Ramsay Jones
2016-06-09 17:12                         ` Jeff King
2016-06-09 22:40                           ` Ramsay Jones
2016-06-09 16:40                       ` Junio C Hamano
2016-06-09 17:14                         ` Jeff King
2016-06-09 17:22                         ` Matthieu Moy
2016-06-08 19:48                   ` [PATCH V2 3/3] strbuf: allow to use preallocated memory Junio C Hamano
2016-06-08 19:52                     ` Jeff King
2016-06-08 23:05                       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqvb1mxmk4.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=antoine.queru@ensimag.grenoble-inp.fr \
    --cc=francois.beutin@ensimag.grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=mh@glandium.org \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=william.duclot@ensimag.grenoble-inp.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).