From: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, antoine.queru@ensimag.grenoble-inp.fr,
francois.beutin@ensimag.grenoble-inp.fr, mhagger@alum.mit.edu,
Johannes.Schindelin@gmx.de, peff@peff.net, mh@glandium.org
Subject: Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory
Date: Mon, 6 Jun 2016 22:39:01 +0200 [thread overview]
Message-ID: <20160606203901.GA7667@Messiaen> (raw)
In-Reply-To: <xmqqvb1mxmk4.fsf@gitster.mtv.corp.google.com>
On Mon, Jun 06, 2016 at 10:19:07AM -0700, Junio C Hamano wrote:
> William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:
>
>> +#define MAX_ALLOC(a, b) (((a)>(b))?(a):(b))
>
> I do not see why this macro is called MAX_ALLOC(); is there anything
> "alloc" specific to what this does? You may happen to use it only
> for "alloc" related things, but that is not a good reason for such a
> name (if the implementation can only be used for "alloc" specific
> purpose for some reason, then the name is appropriate).
Absolutely right. This macro won't be needed anyway, as you pointed out
in the end of this review
>> @@ -20,16 +23,47 @@ char strbuf_slopbuf[1];
>>
>> void strbuf_init(struct strbuf *sb, size_t hint)
>> {
>> + sb->owns_memory = 0;
>> + sb->fixed_memory = 0;
>> sb->alloc = sb->len = 0;
>> sb->buf = strbuf_slopbuf;
>> if (hint)
>> strbuf_grow(sb, hint);
>> }
>
> I'll complain about _grow() again later, but if the implementation
> is updated to address that complaint, slopbuf-using strbuf could
> start it as a special case of "starts as fixed_memory".
>
>> +static void strbuf_wrap_internal(struct strbuf *sb, char *buf,
>> + size_t buf_len, size_t alloc_len)
>> +{
>> + if (!buf)
>> + die("the buffer of a strbuf cannot be NULL");
>> +
>> + strbuf_release(sb);
>> + sb->len = buf_len;
>> + sb->alloc = alloc_len;
>> + sb->buf = buf;
>> +}
>> +
>> +void strbuf_wrap(struct strbuf *sb, char *buf,
>> + size_t buf_len, size_t alloc_len)
>> +{
>> + strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
>> + sb->owns_memory = 0;
>> + sb->fixed_memory = 0;
>> +}
>>
>> +void strbuf_wrap_fixed(struct strbuf *sb, char *buf,
>> + size_t buf_len, size_t alloc_len)
>> +{
>> + strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
>> + sb->owns_memory = 0;
>> + sb->fixed_memory = 1;
>> +}
>
> After seeing 2/3, I would say that the pretty side deserves the use
> of word "wrap" more. What the above two does is an enhancement of
> what strbuf API has called "attach". You are introducing a few new
> and different ways to attach a piece of memory to a strbuf.
The naming is still debatable indeed. If there is a consensus on using
"attach", fine with me.
That being said, I'm still hesitant on being able to initialize a strbuf
with an "attach" function, as the user shouldn't have to use
`strbuf_init()` or `= STRBUF_INIT` beforhand. The users can't use
`strbuf_attach()` with a non-initialized strbuf, is that a good idea to
allow them to use `strbuf_attach_preallocated()` on a non-initialized
buffer?
>> @@ -37,8 +71,13 @@ void strbuf_release(struct strbuf *sb)
>> char *strbuf_detach(struct strbuf *sb, size_t *sz)
>> {
>> char *res;
>> +
>> strbuf_grow(sb, 0);
>> - res = sb->buf;
>> + if (sb->owns_memory)
>> + res = sb->buf;
>> + else
>> + res = xmemdupz(sb->buf, sb->len);
>> +
>> if (sz)
>> *sz = sb->len;
>> strbuf_init(sb, 0);
>
> Note that the user of the API does not have to care if the memory is
> held by the strbuf or if the strbuf is merely borrowing it from the
> initial allocation made from elsewhere (e.g. via strbuf_wrap_*) when
> finalizing its use of the strbuf API to build the contents and
> utilize the resulting buffer. Theoretically, if the caller knows it
> let the piece memory _it_ owns (i.e. !owns_memory) and the piece of
> memory it originally gave the strbuf hasn't been moved (i.e. your
> fixed_memory, which I will complain again soon), the caller
> shouldn't have to get a copy via xmemdupz(), which would both save
> an allocation here and free() in the caller once it is done.
>
> But that is not done, and as the API it is a good thing. It frees
> the caller from having to worry about _where_ the memory originally
> came from.
>
>> void strbuf_grow(struct strbuf *sb, size_t extra)
>> {
>> - int new_buf = !sb->alloc;
>> if (unsigned_add_overflows(extra, 1) ||
>> unsigned_add_overflows(sb->len, extra + 1))
>> die("you want to use way too much memory");
>> - if (new_buf)
>> - sb->buf = NULL;
>> - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>> - if (new_buf)
>> - sb->buf[0] = '\0';
>> + if ((sb->fixed_memory) &&
>> + sb->len + extra + 1 > sb->alloc)
>> + die("you try to overflow the buffer of a fixed strbuf");
>
> Having praised how _detach() frees the user of the API from having
> to worry about minute details of allocation, this is an eyesore. I
> really think this "fixed_memory" support MUST GO. This "die()" here
> makes the API useless _unless_ the caller always makes sure that any
> operation it performs does not overflow the originally allocated
> size, which makes it no better than using xsnprintf() and friends.
I'm not a fan of this die() either.
> A better design would be to get rid of fixed_memory bit and replace
> it with "owns_memory" bit. Its meaning would be "The ->buf pointer
> points at something that is not owned by this strbuf, and must not
> be freed when resizing or releasing ->buf."
>
> And when you detect that a strbuf with unfreeable memory needs to be
> extended (the above condition you call "die()" on), you use the
> "strbuf doesn't own the buffer" logic below.
>
> The only difference between your version and an improved one that
> gets rid of "fixed_memory" from the API user's point of view is that
> they must pre-allocate absolute maximum amount of memory expected to
> be consumed with your version in order not to crash, while a "start
> with fixed but dynamically allocate from heap if needed" approach
> lets the API user preallocate just sufficient amount of memory
> expected to be consumed in a typical workload and go without calling
> malloc()/free() most of the time. Oh, and when a rare data comes in
> that exceeds your initial estimate, "fixed_memory" version will just
> die.
>
> So, as I said in the previous round, I really do not see a point in
> this fixed_memory business. It does not make the implementation of
> the API any easier as far as I can tell, either.
I'm not sure to follow you. I agree that the "fixed strbuf" feature is
flawed by the presence of this `die()`. But (unless misunderstanding)
the "owns_memory" bit you talk about does exist in this patch, and allow
the exact behavior you describe.
The patch allow to use a preallocated strbuf OR a preallocated-and-fixed
strbuf. Does the "fixed" part is useful, that's very debatable.
>> + /*
>> + * ALLOC_GROW may do a realloc() if needed, so we must not use it on
>> + * a buffer the strbuf doesn't own
>> + */
>> + if (sb->owns_memory) {
>> + ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>> + } else {
>> + /*
>> + * The strbuf doesn't own the buffer: to avoid to realloc it,
>> + * the strbuf needs to use a new buffer without freeing the old
>> + */
>> + if (sb->len + extra + 1 > sb->alloc) {
>> + size_t new_alloc =
>> + MAX_ALLOC(sb->len + extra + 1,
>> + alloc_nr(sb->alloc));
>
> It is true that the you cannot simply realloc(->buf), but otherwise
> the growth pattern of the buffer is the same between ->owns_memory
> case and this case. And I think the above computes it like so, but
> do you really need a one-shot macro MAX_ALLOC that is named in a
> misleading way?
>
> I'd find it far easier to read if you wrote the core of ALLOC_GROW()
> logic, like this:
>
> size_t new_nr = sb->len + extra + 1;
> if (new_nr > sb->alloc) {
> size_t new_alloc;
> if (alloc_nr(sb->alloc) < new_nr)
> new_alloc = new_nr;
> else
> new_alloc = alloc_nr(sb->alloc);
> ... copy and make it "owned" ...
> }
>
> A better way would be to introduce a new macro ALLOC_GROW_COUNT() to
> cache.h immediately before ALLOC_GROW() is defined and use it to
> redo ALLOC_GROW(), perhaps like this:
>
> #define ALLOC_GROW_COUNT(nr, alloc) \
> ((alloc_nr(alloc) < (nr)) ? (nr) : alloc_nr(alloc))
>
> #define ALLOC_GROW(x, nr, alloc) \
> do { \
> if ((nr) > alloc) { \
> alloc = ALLOC_GROW_COUNT(nr, alloc); \
> REALLOC_ARRAY(x, alloc); \
> } \
> } while (0)
>
> Then you can do
>
> if (sb->len + extra + 1 > sb->alloc) {
> size_t new_alloc;
> char *new_buf;
>
> new_alloc = ALLOC_GROW_COUNT(sb->len + extra + 1, sb->alloc);
> new_buf = xmalloc(new_alloc);
> memcpy(...);
> ...
The second way seems better to me too, I don't know why I were hesitant
to edit the ALLOC_GROW macro
next prev parent reply other threads:[~2016-06-06 20:39 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-06 15:13 [PATCH V2 0/3] strbuf: improve API William Duclot
2016-06-06 15:13 ` [PATCH V2 1/3] strbuf: add tests William Duclot
2016-06-06 16:11 ` Matthieu Moy
2016-06-07 8:44 ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 2/3] pretty.c: rename strbuf_wrap() function William Duclot
2016-06-06 16:12 ` Matthieu Moy
2016-06-07 9:04 ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 3/3] strbuf: allow to use preallocated memory William Duclot
2016-06-06 16:17 ` Matthieu Moy
2016-06-06 17:19 ` Junio C Hamano
2016-06-06 20:39 ` William Duclot [this message]
2016-06-06 22:44 ` Junio C Hamano
2016-06-06 22:58 ` Jeff King
2016-06-06 23:24 ` Junio C Hamano
2016-06-06 23:25 ` Junio C Hamano
2016-06-06 23:30 ` Jeff King
2016-06-07 9:06 ` William Duclot
2016-06-07 18:10 ` Junio C Hamano
2016-06-08 16:20 ` Michael Haggerty
2016-06-08 18:07 ` Junio C Hamano
2016-06-08 19:19 ` Jeff King
2016-06-08 19:42 ` [PATCH] send-pack: use buffered I/O to talk to pack-objects Jeff King
2016-06-09 12:10 ` Matthieu Moy
2016-06-09 14:34 ` Ramsay Jones
2016-06-09 17:12 ` Jeff King
2016-06-09 22:40 ` Ramsay Jones
2016-06-09 16:40 ` Junio C Hamano
2016-06-09 17:14 ` Jeff King
2016-06-09 17:22 ` Matthieu Moy
2016-06-08 19:48 ` [PATCH V2 3/3] strbuf: allow to use preallocated memory Junio C Hamano
2016-06-08 19:52 ` Jeff King
2016-06-08 23:05 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160606203901.GA7667@Messiaen \
--to=william.duclot@ensimag.grenoble-inp.fr \
--cc=Johannes.Schindelin@gmx.de \
--cc=antoine.queru@ensimag.grenoble-inp.fr \
--cc=francois.beutin@ensimag.grenoble-inp.fr \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mh@glandium.org \
--cc=mhagger@alum.mit.edu \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).