From mboxrd@z Thu Jan 1 00:00:00 1970 From: William Duclot Subject: Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory Date: Mon, 6 Jun 2016 22:39:01 +0200 Message-ID: <20160606203901.GA7667@Messiaen> References: <20160606151340.22424-1-william.duclot@ensimag.grenoble-inp.fr> <20160606151340.22424-4-william.duclot@ensimag.grenoble-inp.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: git@vger.kernel.org, antoine.queru@ensimag.grenoble-inp.fr, francois.beutin@ensimag.grenoble-inp.fr, mhagger@alum.mit.edu, Johannes.Schindelin@gmx.de, peff@peff.net, mh@glandium.org To: Junio C Hamano X-From: git-owner@vger.kernel.org Mon Jun 06 22:39:14 2016 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bA1Im-0003bE-Ui for gcvg-git-2@plane.gmane.org; Mon, 06 Jun 2016 22:39:13 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752035AbcFFUjI (ORCPT ); Mon, 6 Jun 2016 16:39:08 -0400 Received: from zm-etu-ensimag-1.grenet.fr ([130.190.244.117]:57070 "EHLO zm-etu-ensimag-1.grenet.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751718AbcFFUjH (ORCPT ); Mon, 6 Jun 2016 16:39:07 -0400 Received: from localhost (localhost [127.0.0.1]) by zm-smtpout-1.grenet.fr (Postfix) with ESMTP id 12F772129; Mon, 6 Jun 2016 22:39:03 +0200 (CEST) Received: from zm-smtpout-1.grenet.fr ([127.0.0.1]) by localhost (zm-smtpout-1.grenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FjC5jhy6lO61; Mon, 6 Jun 2016 22:39:03 +0200 (CEST) Received: from zm-smtpauth-2.grenet.fr (zm-smtpauth-2.grenet.fr [130.190.244.123]) by zm-smtpout-1.grenet.fr (Postfix) with ESMTP id E54702087; Mon, 6 Jun 2016 22:39:02 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by zm-smtpauth-2.grenet.fr (Postfix) with ESMTP id DA92E2066; Mon, 6 Jun 2016 22:39:02 +0200 (CEST) Received: from zm-smtpauth-2.grenet.fr ([127.0.0.1]) by localhost (zm-smtpauth-2.grenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XKCFF28Uwhk5; Mon, 6 Jun 2016 22:39:02 +0200 (CEST) Received: from Messiaen (mut38-h02-176-189-75-243.dsl.sta.abo.bbox.fr [176.189.75.243]) by zm-smtpauth-2.grenet.fr (Postfix) with ESMTPSA id 46CE22055; Mon, 6 Jun 2016 22:39:02 +0200 (CEST) Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, Jun 06, 2016 at 10:19:07AM -0700, Junio C Hamano wrote: > William Duclot writes: > >> +#define MAX_ALLOC(a, b) (((a)>(b))?(a):(b)) > > I do not see why this macro is called MAX_ALLOC(); is there anything > "alloc" specific to what this does? You may happen to use it only > for "alloc" related things, but that is not a good reason for such a > name (if the implementation can only be used for "alloc" specific > purpose for some reason, then the name is appropriate). Absolutely right. This macro won't be needed anyway, as you pointed out in the end of this review >> @@ -20,16 +23,47 @@ char strbuf_slopbuf[1]; >> >> void strbuf_init(struct strbuf *sb, size_t hint) >> { >> + sb->owns_memory = 0; >> + sb->fixed_memory = 0; >> sb->alloc = sb->len = 0; >> sb->buf = strbuf_slopbuf; >> if (hint) >> strbuf_grow(sb, hint); >> } > > I'll complain about _grow() again later, but if the implementation > is updated to address that complaint, slopbuf-using strbuf could > start it as a special case of "starts as fixed_memory". > >> +static void strbuf_wrap_internal(struct strbuf *sb, char *buf, >> + size_t buf_len, size_t alloc_len) >> +{ >> + if (!buf) >> + die("the buffer of a strbuf cannot be NULL"); >> + >> + strbuf_release(sb); >> + sb->len = buf_len; >> + sb->alloc = alloc_len; >> + sb->buf = buf; >> +} >> + >> +void strbuf_wrap(struct strbuf *sb, char *buf, >> + size_t buf_len, size_t alloc_len) >> +{ >> + strbuf_wrap_internal(sb, buf, buf_len, alloc_len); >> + sb->owns_memory = 0; >> + sb->fixed_memory = 0; >> +} >> >> +void strbuf_wrap_fixed(struct strbuf *sb, char *buf, >> + size_t buf_len, size_t alloc_len) >> +{ >> + strbuf_wrap_internal(sb, buf, buf_len, alloc_len); >> + sb->owns_memory = 0; >> + sb->fixed_memory = 1; >> +} > > After seeing 2/3, I would say that the pretty side deserves the use > of word "wrap" more. What the above two does is an enhancement of > what strbuf API has called "attach". You are introducing a few new > and different ways to attach a piece of memory to a strbuf. The naming is still debatable indeed. If there is a consensus on using "attach", fine with me. That being said, I'm still hesitant on being able to initialize a strbuf with an "attach" function, as the user shouldn't have to use `strbuf_init()` or `= STRBUF_INIT` beforhand. The users can't use `strbuf_attach()` with a non-initialized strbuf, is that a good idea to allow them to use `strbuf_attach_preallocated()` on a non-initialized buffer? >> @@ -37,8 +71,13 @@ void strbuf_release(struct strbuf *sb) >> char *strbuf_detach(struct strbuf *sb, size_t *sz) >> { >> char *res; >> + >> strbuf_grow(sb, 0); >> - res = sb->buf; >> + if (sb->owns_memory) >> + res = sb->buf; >> + else >> + res = xmemdupz(sb->buf, sb->len); >> + >> if (sz) >> *sz = sb->len; >> strbuf_init(sb, 0); > > Note that the user of the API does not have to care if the memory is > held by the strbuf or if the strbuf is merely borrowing it from the > initial allocation made from elsewhere (e.g. via strbuf_wrap_*) when > finalizing its use of the strbuf API to build the contents and > utilize the resulting buffer. Theoretically, if the caller knows it > let the piece memory _it_ owns (i.e. !owns_memory) and the piece of > memory it originally gave the strbuf hasn't been moved (i.e. your > fixed_memory, which I will complain again soon), the caller > shouldn't have to get a copy via xmemdupz(), which would both save > an allocation here and free() in the caller once it is done. > > But that is not done, and as the API it is a good thing. It frees > the caller from having to worry about _where_ the memory originally > came from. > >> void strbuf_grow(struct strbuf *sb, size_t extra) >> { >> - int new_buf = !sb->alloc; >> if (unsigned_add_overflows(extra, 1) || >> unsigned_add_overflows(sb->len, extra + 1)) >> die("you want to use way too much memory"); >> - if (new_buf) >> - sb->buf = NULL; >> - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); >> - if (new_buf) >> - sb->buf[0] = '\0'; >> + if ((sb->fixed_memory) && >> + sb->len + extra + 1 > sb->alloc) >> + die("you try to overflow the buffer of a fixed strbuf"); > > Having praised how _detach() frees the user of the API from having > to worry about minute details of allocation, this is an eyesore. I > really think this "fixed_memory" support MUST GO. This "die()" here > makes the API useless _unless_ the caller always makes sure that any > operation it performs does not overflow the originally allocated > size, which makes it no better than using xsnprintf() and friends. I'm not a fan of this die() either. > A better design would be to get rid of fixed_memory bit and replace > it with "owns_memory" bit. Its meaning would be "The ->buf pointer > points at something that is not owned by this strbuf, and must not > be freed when resizing or releasing ->buf." > > And when you detect that a strbuf with unfreeable memory needs to be > extended (the above condition you call "die()" on), you use the > "strbuf doesn't own the buffer" logic below. > > The only difference between your version and an improved one that > gets rid of "fixed_memory" from the API user's point of view is that > they must pre-allocate absolute maximum amount of memory expected to > be consumed with your version in order not to crash, while a "start > with fixed but dynamically allocate from heap if needed" approach > lets the API user preallocate just sufficient amount of memory > expected to be consumed in a typical workload and go without calling > malloc()/free() most of the time. Oh, and when a rare data comes in > that exceeds your initial estimate, "fixed_memory" version will just > die. > > So, as I said in the previous round, I really do not see a point in > this fixed_memory business. It does not make the implementation of > the API any easier as far as I can tell, either. I'm not sure to follow you. I agree that the "fixed strbuf" feature is flawed by the presence of this `die()`. But (unless misunderstanding) the "owns_memory" bit you talk about does exist in this patch, and allow the exact behavior you describe. The patch allow to use a preallocated strbuf OR a preallocated-and-fixed strbuf. Does the "fixed" part is useful, that's very debatable. >> + /* >> + * ALLOC_GROW may do a realloc() if needed, so we must not use it on >> + * a buffer the strbuf doesn't own >> + */ >> + if (sb->owns_memory) { >> + ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); >> + } else { >> + /* >> + * The strbuf doesn't own the buffer: to avoid to realloc it, >> + * the strbuf needs to use a new buffer without freeing the old >> + */ >> + if (sb->len + extra + 1 > sb->alloc) { >> + size_t new_alloc = >> + MAX_ALLOC(sb->len + extra + 1, >> + alloc_nr(sb->alloc)); > > It is true that the you cannot simply realloc(->buf), but otherwise > the growth pattern of the buffer is the same between ->owns_memory > case and this case. And I think the above computes it like so, but > do you really need a one-shot macro MAX_ALLOC that is named in a > misleading way? > > I'd find it far easier to read if you wrote the core of ALLOC_GROW() > logic, like this: > > size_t new_nr = sb->len + extra + 1; > if (new_nr > sb->alloc) { > size_t new_alloc; > if (alloc_nr(sb->alloc) < new_nr) > new_alloc = new_nr; > else > new_alloc = alloc_nr(sb->alloc); > ... copy and make it "owned" ... > } > > A better way would be to introduce a new macro ALLOC_GROW_COUNT() to > cache.h immediately before ALLOC_GROW() is defined and use it to > redo ALLOC_GROW(), perhaps like this: > > #define ALLOC_GROW_COUNT(nr, alloc) \ > ((alloc_nr(alloc) < (nr)) ? (nr) : alloc_nr(alloc)) > > #define ALLOC_GROW(x, nr, alloc) \ > do { \ > if ((nr) > alloc) { \ > alloc = ALLOC_GROW_COUNT(nr, alloc); \ > REALLOC_ARRAY(x, alloc); \ > } \ > } while (0) > > Then you can do > > if (sb->len + extra + 1 > sb->alloc) { > size_t new_alloc; > char *new_buf; > > new_alloc = ALLOC_GROW_COUNT(sb->len + extra + 1, sb->alloc); > new_buf = xmalloc(new_alloc); > memcpy(...); > ... The second way seems better to me too, I don't know why I were hesitant to edit the ALLOC_GROW macro