Hi Bruno, On 1/6/20 1:46 PM, Bruno Haible wrote: >>>>> - providing primitives for string allocation reduces the amount of buffer >>>>> overflow bugs that otherwise occur in this area. [1] >>>>> [1] https://lists.gnu.org/archive/html/bug-gnulib/2019-09/msg00031.html >>> ... >> We created a 'catch them all' string/buffer type plus API. It is a good >> compromise for all kinds of situations, works like a memory buffer but >> is guaranteed 0-terminated, allows custom stack buffers with fallback to >> heap if to small. >> >> https://gitlab.com/gnuwget/wget2/blob/master/libwget/buffer.c >> >> >> There also is a sprintf functionality (glibc compatible) using these >> buffers - and the operation is well faster than glibc's sprintf-like >> functions for all format strings tested (tested back a few years). The >> code is also much smaller (380 C code lines), the return values are >> size_t. It doesn't support float/double. >> >> https://gitlab.com/gnuwget/wget2/blob/master/libwget/buffer_printf.c >> >> If there is serious interest, I could prepare modules for gnulib. > > It is interesting that your solution does not only cover the simple cases > (string concatenation, etc.), but also the more complex one, possibly > with if()s in the generation logic, and all this without blatant potential > for buffer overflow bugs. > > So, the solution would consists of the following parts: > (A) A growable buffer type, with up to N (128 or 1024 or so) bytes on > the stack. Preferable, the initial size and if starting with heap or stack buffer should be (runtime) configurable. - initial size because it allows fine-tuning to better avoid reallocations - initial stack if used as local / temporary buffer - initial heap when you already know that the resulting string has to persist the function return Currently there is are two init functions (I leave away the wget namespace): int buffer_init(buffer *buf, char *data, size_t size); buffer *buffer_alloc(size_t size); buffer_alloc creates a buffer instance on the heap and initializes it with a heap buffer of size. buffer_init(buf, data, date_size) initializes 'buf' with the given data and data_size. data will not be free'd, so stack data can be used here. buffer_init(buf, NULL, date_size) initializes 'buf' with freshly allocated heap data of size 'data_size'. buffer_init(buf, NULL, 0) initializes 'buf' with freshly allocated heap data of size 128. We could leave this out - it's a currently unused special case to avoid error handling. Then there is int buffer_ensure_capacity(buffer *buf, size_t size); > (B) A set of functions for appending to such a growable buffer. To copy a number of bytes to the beginning (effectively dropping the previous content): size_t buffer_memcpy(buffer *buf, const void *data, size_t length); To append a number of bytes: size_t buffer_memcat(buffer *buf, const void *data, size_t length); To copy a string to the beginning (effectively dropping the previous content): size_t buffer_strcpy(buffer *buf, const char *s); To append a string: size_t buffer_strcat(buffer *buf, const char *s); To set a number of bytes at the beginning (effectively dropping the previous content): size_t buffer_memset(buffer *buf, char c, size_t length); To append a number of the same bytes: size_t buffer_memset_append(buffer *buf, char c, size_t length); > (C) A function for creating a heap-allocated 'char *' from a growable > buffer. Currently we do: buffer buf; buffer_init(&buf, NULL, date_size); // allocate buf.data on heap ... add stuff to buf ... mydata = buf.data; buf.data = NULL; buffer_deinit(&buf); We could make up a (static inline) for this, named void *buffer_deinit_transfer(buffer *buf); This function could also call realloc() to shrink 'data' to it's occupied length. > (D) Short-hand functions for the simple cases (like string concatenation). See above, e.g. buffer_strcpy(buf, scheme); buffer_strcat(buf, "://"); buffer_strcat(buf, domain); buffer_memcat(buf, ":", 1); buffer_strcat(buf, port_s); buffer_memcat(buf, "/", 1); buffer_strcat(buf, path); But I prefer the slightly slower but better readable form buffer_printf("%s://%s:%d/%s", scheme, domain, port, path); Since our printf-like functions directly write into a buffer, there is no overhead for copying data. > It would be good to have all these well integrated (in terms of function > names and calling conventions). So far, in gnulib, we have only pieces of > it: > - Module 'scratch_buffer' is (A) without (B), (C), (D). > - Modules 'vasnprintf', 'asprintf' are (B), (C), (D) but without (A). > > Before you start writing the code, it's worth looking at the following > questions: > * Should the module 'scratch_buffer' be reused for (A)? Or is this > not possible? If not, can it still have a memory-leak prevention > like described in lib/malloc/scratch_buffer.h? I don't see the advantage of the described memory-leak prevention. On memory error scratch_buffer_grow() releases all memory, so you can simply return ? Most functions have a 'cleanup' part anyways, which needs to be jumped to before return. Why not add a scratch_buffer_free() there ? Also, scratch_buffer has IMO these flaws. (1) The fixed initial size of 1024 on the stack. This should be tunable. (2) If you need a scratch_buffer itself on the heap, you can't free those initial 1024 bytes. (3) The 'expected usage' as described in scratch_buffer.h expects the developer to explicitly design his functions in a certain way. > * What about reusing the complete vasnprintf.c for (B), rather than > adding another, limited printf-like implementation? Yes, would be nice. At least for appending we need an extra malloc/malloc/memcpy/free. vasnprintf reallocates RESULTBUF that means we can't use a stack allocated buffer - thus we lose the performance advantage. Or should we try snprintf first and fallback to vasnprintf in case of truncation ? We want another module e.g. buffer-printf to not pull in vasnprintf when not needing printf-like buffer functions. Once the buffer module is done, we could think of amending vasnprintf to better play with the buffer type. > * Is it best to implement (D) based on (A), (B), (C), or directly > from scratch? Better from scratch, but the code already exists (with FSF copyright) and has been used a lot. So it's not really from scratch, though we can amend/optimize a few things. Regards, Tim