git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, antoine.queru@ensimag.grenoble-inp.fr,
	francois.beutin@ensimag.grenoble-inp.fr, mhagger@alum.mit.edu,
	Johannes.Schindelin@gmx.de, peff@peff.net, mh@glandium.org
Subject: Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory
Date: Mon, 6 Jun 2016 22:39:01 +0200	[thread overview]
Message-ID: <20160606203901.GA7667@Messiaen> (raw)
In-Reply-To: <xmqqvb1mxmk4.fsf@gitster.mtv.corp.google.com>

On Mon, Jun 06, 2016 at 10:19:07AM -0700, Junio C Hamano wrote:
> William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:
> 
>> +#define MAX_ALLOC(a, b) (((a)>(b))?(a):(b))
> 
> I do not see why this macro is called MAX_ALLOC(); is there anything
> "alloc" specific to what this does?  You may happen to use it only
> for "alloc" related things, but that is not a good reason for such a
> name (if the implementation can only be used for "alloc" specific
> purpose for some reason, then the name is appropriate).

Absolutely right. This macro won't be needed anyway, as you pointed out
in the end of this review
 
>> @@ -20,16 +23,47 @@ char strbuf_slopbuf[1];
>>  
>>  void strbuf_init(struct strbuf *sb, size_t hint)
>>  {
>> +	sb->owns_memory = 0;
>> +	sb->fixed_memory = 0;
>>  	sb->alloc = sb->len = 0;
>>  	sb->buf = strbuf_slopbuf;
>>  	if (hint)
>>  		strbuf_grow(sb, hint);
>>  }
> 
> I'll complain about _grow() again later, but if the implementation
> is updated to address that complaint, slopbuf-using strbuf could
> start it as a special case of "starts as fixed_memory".
> 
>> +static void strbuf_wrap_internal(struct strbuf *sb, char *buf,
>> +				 size_t buf_len, size_t alloc_len)
>> +{
>> +	if (!buf)
>> +		die("the buffer of a strbuf cannot be NULL");
>> +
>> +	strbuf_release(sb);
>> +	sb->len = buf_len;
>> +	sb->alloc = alloc_len;
>> +	sb->buf = buf;
>> +}
>> +
>> +void strbuf_wrap(struct strbuf *sb, char *buf,
>> +		 size_t buf_len, size_t alloc_len)
>> +{
>> +	strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
>> +	sb->owns_memory = 0;
>> +	sb->fixed_memory = 0;
>> +}
>>
>> +void strbuf_wrap_fixed(struct strbuf *sb, char *buf,
>> +		       size_t buf_len, size_t alloc_len)
>> +{
>> +	strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
>> +	sb->owns_memory = 0;
>> +	sb->fixed_memory = 1;
>> +}
> 
> After seeing 2/3, I would say that the pretty side deserves the use
> of word "wrap" more.  What the above two does is an enhancement of
> what strbuf API has called "attach".  You are introducing a few new
> and different ways to attach a piece of memory to a strbuf.

The naming is still debatable indeed. If there is a consensus on using
"attach", fine with me.
That being said, I'm still hesitant on being able to initialize a strbuf
with an "attach" function, as the user shouldn't have to use
`strbuf_init()` or `= STRBUF_INIT` beforhand. The users can't use
`strbuf_attach()` with a non-initialized strbuf, is that a good idea to
allow them to use `strbuf_attach_preallocated()` on a non-initialized
buffer?

>> @@ -37,8 +71,13 @@ void strbuf_release(struct strbuf *sb)
>>  char *strbuf_detach(struct strbuf *sb, size_t *sz)
>>  {
>>  	char *res;
>> +
>>  	strbuf_grow(sb, 0);
>> -	res = sb->buf;
>> +	if (sb->owns_memory)
>> +		res = sb->buf;
>> +	else
>> +		res = xmemdupz(sb->buf, sb->len);
>> +
>>  	if (sz)
>>  		*sz = sb->len;
>>  	strbuf_init(sb, 0);
> 
> Note that the user of the API does not have to care if the memory is
> held by the strbuf or if the strbuf is merely borrowing it from the
> initial allocation made from elsewhere (e.g. via strbuf_wrap_*) when
> finalizing its use of the strbuf API to build the contents and
> utilize the resulting buffer.  Theoretically, if the caller knows it
> let the piece memory _it_ owns (i.e. !owns_memory) and the piece of
> memory it originally gave the strbuf hasn't been moved (i.e. your
> fixed_memory, which I will complain again soon), the caller
> shouldn't have to get a copy via xmemdupz(), which would both save
> an allocation here and free() in the caller once it is done.
> 
> But that is not done, and as the API it is a good thing.  It frees
> the caller from having to worry about _where_ the memory originally
> came from.
> 
>>  void strbuf_grow(struct strbuf *sb, size_t extra)
>>  {
>> -	int new_buf = !sb->alloc;
>>  	if (unsigned_add_overflows(extra, 1) ||
>>  	    unsigned_add_overflows(sb->len, extra + 1))
>>  		die("you want to use way too much memory");
>> -	if (new_buf)
>> -		sb->buf = NULL;
>> -	ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>> -	if (new_buf)
>> -		sb->buf[0] = '\0';
>> +	if ((sb->fixed_memory) &&
>> +	    sb->len + extra + 1 > sb->alloc)
>> +		die("you try to overflow the buffer of a fixed strbuf");
> 
> Having praised how _detach() frees the user of the API from having
> to worry about minute details of allocation, this is an eyesore.  I
> really think this "fixed_memory" support MUST GO.  This "die()" here
> makes the API useless _unless_ the caller always makes sure that any
> operation it performs does not overflow the originally allocated
> size, which makes it no better than using xsnprintf() and friends.

I'm not a fan of this die() either.

> A better design would be to get rid of fixed_memory bit and replace
> it with "owns_memory" bit.  Its meaning would be "The ->buf pointer
> points at something that is not owned by this strbuf, and must not
> be freed when resizing or releasing ->buf."
> 
> And when you detect that a strbuf with unfreeable memory needs to be
> extended (the above condition you call "die()" on), you use the
> "strbuf doesn't own the buffer" logic below.
> 
> The only difference between your version and an improved one that
> gets rid of "fixed_memory" from the API user's point of view is that
> they must pre-allocate absolute maximum amount of memory expected to
> be consumed with your version in order not to crash, while a "start
> with fixed but dynamically allocate from heap if needed" approach
> lets the API user preallocate just sufficient amount of memory
> expected to be consumed in a typical workload and go without calling
> malloc()/free() most of the time.  Oh, and when a rare data comes in
> that exceeds your initial estimate, "fixed_memory" version will just
> die.
> 
> So, as I said in the previous round, I really do not see a point in
> this fixed_memory business.  It does not make the implementation of
> the API any easier as far as I can tell, either.

I'm not sure to follow you. I agree that the "fixed strbuf" feature is
flawed by the presence of this `die()`. But (unless misunderstanding)
the "owns_memory" bit you talk about does exist in this patch, and allow
the exact behavior you describe.
The patch allow to use a preallocated strbuf OR a preallocated-and-fixed
strbuf. Does the "fixed" part is useful, that's very debatable.

>> +	/*
>> +	 * ALLOC_GROW may do a realloc() if needed, so we must not use it on
>> +	 * a buffer the strbuf doesn't own
>> +	 */
>> +	if (sb->owns_memory) {
>> +		ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>> +	} else {
>> +		/*
>> +		 * The strbuf doesn't own the buffer: to avoid to realloc it,
>> +		 * the strbuf needs to use a new buffer without freeing the old
>> +		 */
>> +		if (sb->len + extra + 1 > sb->alloc) {
>> +			size_t new_alloc =
>> +				MAX_ALLOC(sb->len + extra + 1,
>> +					  alloc_nr(sb->alloc));
> 
> It is true that the you cannot simply realloc(->buf), but otherwise
> the growth pattern of the buffer is the same between ->owns_memory
> case and this case.  And I think the above computes it like so, but
> do you really need a one-shot macro MAX_ALLOC that is named in a
> misleading way?
> 
> I'd find it far easier to read if you wrote the core of ALLOC_GROW()
> logic, like this:
> 
> 	size_t new_nr = sb->len + extra + 1;
> 	if (new_nr > sb->alloc) {
> 		size_t new_alloc;
> 		if (alloc_nr(sb->alloc) < new_nr)
>                 	new_alloc = new_nr;
> 		else
> 			new_alloc = alloc_nr(sb->alloc);
> 		... copy and make it "owned" ...
> 	}
> 
> A better way would be to introduce a new macro ALLOC_GROW_COUNT() to
> cache.h immediately before ALLOC_GROW() is defined and use it to
> redo ALLOC_GROW(), perhaps like this:
> 
> #define ALLOC_GROW_COUNT(nr, alloc) \
>         ((alloc_nr(alloc) < (nr)) ? (nr) : alloc_nr(alloc))
> 
> #define ALLOC_GROW(x, nr, alloc) \
> 	do { \
>                 if ((nr) > alloc) { \
>                 	alloc = ALLOC_GROW_COUNT(nr, alloc); \
> 			REALLOC_ARRAY(x, alloc); \
> 		} \
> 	} while (0)			
> 
> Then you can do
> 
> 	if (sb->len + extra + 1 > sb->alloc) {
> 		size_t new_alloc;
>                 char *new_buf;
> 
>         	new_alloc = ALLOC_GROW_COUNT(sb->len + extra + 1, sb->alloc);
> 		new_buf = xmalloc(new_alloc);
> 		memcpy(...);
> 		...

The second way seems better to me too, I don't know why I were hesitant
to edit the ALLOC_GROW macro

  reply	other threads:[~2016-06-06 20:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 15:13 [PATCH V2 0/3] strbuf: improve API William Duclot
2016-06-06 15:13 ` [PATCH V2 1/3] strbuf: add tests William Duclot
2016-06-06 16:11   ` Matthieu Moy
2016-06-07  8:44   ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 2/3] pretty.c: rename strbuf_wrap() function William Duclot
2016-06-06 16:12   ` Matthieu Moy
2016-06-07  9:04   ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 3/3] strbuf: allow to use preallocated memory William Duclot
2016-06-06 16:17   ` Matthieu Moy
2016-06-06 17:19   ` Junio C Hamano
2016-06-06 20:39     ` William Duclot [this message]
2016-06-06 22:44       ` Junio C Hamano
2016-06-06 22:58         ` Jeff King
2016-06-06 23:24           ` Junio C Hamano
2016-06-06 23:25             ` Junio C Hamano
2016-06-06 23:30             ` Jeff King
2016-06-07  9:06             ` William Duclot
2016-06-07 18:10               ` Junio C Hamano
2016-06-08 16:20               ` Michael Haggerty
2016-06-08 18:07                 ` Junio C Hamano
2016-06-08 19:19                 ` Jeff King
2016-06-08 19:42                   ` [PATCH] send-pack: use buffered I/O to talk to pack-objects Jeff King
2016-06-09 12:10                     ` Matthieu Moy
2016-06-09 14:34                       ` Ramsay Jones
2016-06-09 17:12                         ` Jeff King
2016-06-09 22:40                           ` Ramsay Jones
2016-06-09 16:40                       ` Junio C Hamano
2016-06-09 17:14                         ` Jeff King
2016-06-09 17:22                         ` Matthieu Moy
2016-06-08 19:48                   ` [PATCH V2 3/3] strbuf: allow to use preallocated memory Junio C Hamano
2016-06-08 19:52                     ` Jeff King
2016-06-08 23:05                       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160606203901.GA7667@Messiaen \
    --to=william.duclot@ensimag.grenoble-inp.fr \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=antoine.queru@ensimag.grenoble-inp.fr \
    --cc=francois.beutin@ensimag.grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mh@glandium.org \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).