* RFC: add a string-desc module
@ 2023-03-24 21:50 Bruno Haible
2023-03-24 22:32 ` Paul Eggert
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-24 21:50 UTC (permalink / raw)
To: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 2397 bytes --]
In most application areas, it is not a problem if strings cannot contain NUL
bytes, and thus the C type 'char *' with its NUL terminator is well usable.
In areas where strings with embedded NUL bytes need to be handled, the common
approach is to use a 'char * data' pointer together with a 'size_t nbytes'
size. This works fine in code that constructs or manipulates strings with
embedded NUL bytes. But when it comes to *storing* them, for example in an
array or as key or value of a hash table, one needs a type that combines these
two fields:
struct
{
size_t nbytes;
char * data;
}
I propose to add a module that adds such a type, together with elementary
functions that work on them.
Such a type was long known as a "string descriptor" in VMS. It's also known
as basic_string_view<char> in C++, or as String in Java.
The type that I'm proposing does not have NUL byte appended to the data
always and automatically, because I think it is more important to have a
string_desc_substring function that does not cause memory allocation,
than to have string_desc_c function (conversion to 'char *') that does
not cause memory allocation.
The type that I'm proposing does not have two distinct fields
nbytes_used and nbytes_allocated. Such a type, e.g. [1] attempts to
cover the use-case of accumulating a string as well. But
- The Java experience with String vs. StringBuffer/StringBuilder
shows that it is cleaner to separate the two use cases.
- For the use-case of accumulating a string, C programmers have been using
ad-hoc code with n_used and n_allocated for a long time; there is
no need for anything else (except for lazy people who want C to be
a scripting language).
The type that I'm proposing also does not have fields for heap management,
such as a 'bool heap' [2] or a reference count. That's because I think that
- managing the allocated memory of a data structure is a different
problem than that of representing a string, and it can be achieved
with data outside the string descriptor,
- Such a field would make it wrong to simply assign a string descriptor
to a variable.
Please let me know what you think: Does this have a place in Gnulib? (Or
should it stay in GNU gettext, where I need it for the Perl parser?)
Bruno
[1] https://github.com/websnarf/bstrlib/blob/master/bstrlib.txt
[2] https://github.com/maxim2266/str
[-- Attachment #2: string-desc.h --]
[-- Type: text/x-chdr, Size: 4678 bytes --]
/* GNU gettext - internationalization aids
Copyright (C) 2023 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>. */
/* Written by Bruno Haible <bruno@clisp.org>, 2023. */
#ifndef _STRING_DESC_H
#define _STRING_DESC_H 1
/* Get size_t, ptrdiff_t. */
#include <stddef.h>
/* Get bool. */
#include <stdbool.h>
#ifdef __cplusplus
extern "C" {
#endif
/* Type describing a string that may contain NUL bytes.
It's merely a descriptor of an array of bytes. */
typedef struct string_desc_t string_desc_t;
struct string_desc_t
{
size_t nbytes;
char *data;
};
/* String descriptors can be passed and returned by value. */
/* ==== Side-effect-free operations on string descriptors ==== */
/* Return the length of the string S. */
extern size_t string_desc_length (string_desc_t s);
/* Return the byte at index I of string S.
I must be < length(S). */
extern char string_desc_char_at (string_desc_t s, size_t i);
/* Return a read-only view of the bytes of S. */
extern const char * string_desc_data (string_desc_t s);
/* Return true if S is the empty string. */
extern bool string_desc_is_empty (string_desc_t s);
/* Return true if S starts with PREFIX. */
extern bool string_desc_startswith (string_desc_t s, string_desc_t prefix);
/* Return true if S ends with SUFFIX. */
extern bool string_desc_endswith (string_desc_t s, string_desc_t suffix);
/* Return > 0, == 0, or < 0 if A > B, A == B, A < B.
This uses a lexicographic ordering, where the bytes are compared as
'unsigned char'. */
extern int string_desc_cmp (string_desc_t a, string_desc_t b);
/* Return the index of the first occurrence of C in S,
or -1 if there is none. */
extern ptrdiff_t string_desc_index (string_desc_t s, char c);
/* Return the index of the last occurrence of C in S,
or -1 if there is none. */
extern ptrdiff_t string_desc_last_index (string_desc_t s, char c);
/* Return the index of the first occurrence of NEEDLE in HAYSTACK,
or -1 if there is none. */
extern ptrdiff_t string_desc_contains (string_desc_t haystack, string_desc_t needle);
/* Return a string that represents the C string S, of length strlen (S). */
extern string_desc_t string_desc_from_c (const char *s);
/* Return the substring of S, starting at offset START and ending at offset END.
START must be <= END.
The result is of length END - START.
The result must not be freed (since its storage is part of the storage
of S). */
extern string_desc_t string_desc_substring (string_desc_t s, size_t start, size_t end);
/* ==== Memory-allocating operations on string descriptors ==== */
/* Return a string of length N, with uninitialized contents. */
extern string_desc_t string_desc_new (size_t n);
/* Return a string of length N, at the given memory address. */
extern string_desc_t string_desc_new_addr (size_t n, char *addr);
/* Return a string of length N, filled with C. */
extern string_desc_t string_desc_new_filled (size_t n, char c);
/* Return a copy of string S. */
extern string_desc_t string_desc_copy (string_desc_t s);
/* Return the concatenation of N strings. N must be > 0. */
extern string_desc_t string_desc_concat (size_t n, string_desc_t string1, ...);
/* Return a copy of string S, as a NUL-terminated C string. */
extern char * string_desc_c (string_desc_t s);
/* ==== Operations with side effects on string descriptors ==== */
/* Overwrite the byte at index I of string S with C.
I must be < length(S). */
extern void string_desc_set_char_at (string_desc_t s, size_t i, char c);
/* Fill part of S, starting at offset START and ending at offset END,
with copies of C.
START must be <= END. */
extern void string_desc_fill (string_desc_t s, size_t start, size_t end, char c);
/* Overwrite part of S with T, starting at offset START.
START + length(T) must be <= length (S). */
extern void string_desc_overwrite (string_desc_t s, size_t start, string_desc_t t);
/* Free S. */
extern void string_desc_free (string_desc_t s);
#ifdef __cplusplus
}
#endif
#endif /* _STRING_DESC_H */
[-- Attachment #3: string-desc.c --]
[-- Type: text/x-csrc, Size: 6596 bytes --]
/* GNU gettext - internationalization aids
Copyright (C) 2023 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>. */
/* Written by Bruno Haible <bruno@clisp.org>, 2023. */
#ifdef HAVE_CONFIG_H
# include "config.h"
#endif
/* Specification. */
#include "str-desc.h"
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include "xalloc.h"
/* ==== Side-effect-free operations on string descriptors ==== */
size_t
string_desc_length (string_desc_t s)
{
return s.nbytes;
}
char
string_desc_char_at (string_desc_t s, size_t i)
{
if (!(i < s.nbytes))
/* Invalid argument. */
abort ();
return s.data[i];
}
const char *
string_desc_data (string_desc_t s)
{
return s.data;
}
bool
string_desc_is_empty (string_desc_t s)
{
return s.nbytes == 0;
}
bool
string_desc_startswith (string_desc_t s, string_desc_t prefix)
{
return (s.nbytes >= prefix.nbytes
&& (prefix.nbytes == 0
|| memcmp (s.data, prefix.data, prefix.nbytes) == 0));
}
bool
string_desc_endswith (string_desc_t s, string_desc_t suffix)
{
return (s.nbytes >= suffix.nbytes
&& (suffix.nbytes == 0
|| memcmp (s.data + (s.nbytes - suffix.nbytes), suffix.data,
suffix.nbytes) == 0));
}
int
string_desc_cmp (string_desc_t a, string_desc_t b)
{
if (a.nbytes > b.nbytes)
{
if (b.nbytes == 0)
return 1;
return (memcmp (a.data, b.data, b.nbytes) < 0 ? -1 : 1);
}
else if (a.nbytes < b.nbytes)
{
if (a.nbytes == 0)
return -1;
return (memcmp (a.data, b.data, a.nbytes) > 0 ? 1 : -1);
}
else /* a.nbytes == b.nbytes */
{
if (a.nbytes == 0)
return 0;
return memcmp (a.data, b.data, a.nbytes);
}
}
ptrdiff_t
string_desc_index (string_desc_t s, char c)
{
if (s.nbytes > 0)
{
void *found = memchr (s.data, (unsigned char) c, s.nbytes);
if (found != NULL)
return (char *) found - s.data;
}
return -1;
}
ptrdiff_t
string_desc_last_index (string_desc_t s, char c)
{
if (s.nbytes > 0)
{
void *found = memrchr (s.data, (unsigned char) c, s.nbytes);
if (found != NULL)
return (char *) found - s.data;
}
return -1;
}
ptrdiff_t
string_desc_contains (string_desc_t haystack, string_desc_t needle)
{
if (needle.nbytes == 0)
return 0;
void *found =
memmem (haystack.data, haystack.nbytes, needle.data, needle.nbytes);
if (found != NULL)
return (char *) found - haystack.data;
else
return -1;
}
string_desc_t
string_desc_from_c (const char *s)
{
string_desc_t result;
result.nbytes = strlen (s);
result.data = (char *) s;
return result;
}
string_desc_t
string_desc_substring (string_desc_t s, size_t start, size_t end)
{
string_desc_t result;
if (!(start <= end))
/* Invalid arguments. */
abort ();
result.nbytes = end - start;
result.data = s.data + start;
return result;
}
/* ==== Memory-allocating operations on string descriptors ==== */
string_desc_t
string_desc_new (size_t n)
{
string_desc_t result;
result.nbytes = n;
if (n == 0)
result.data = NULL;
else
result.data = (char *) xmalloc (n);
return result;
}
string_desc_t
string_desc_new_addr (size_t n, char *addr)
{
string_desc_t result;
result.nbytes = n;
if (n == 0)
result.data = NULL;
else
result.data = addr;
return result;
}
string_desc_t
string_desc_new_filled (size_t n, char c)
{
string_desc_t result;
result.nbytes = n;
if (n == 0)
result.data = NULL;
else
{
result.data = (char *) xmalloc (n);
memset (result.data, (unsigned char) c, n);
}
return result;
}
string_desc_t
string_desc_copy (string_desc_t s)
{
string_desc_t result;
size_t n = s.nbytes;
result.nbytes = n;
if (n == 0)
result.data = NULL;
else
{
result.data = (char *) xmalloc (n);
memcpy (result.data, s.data, n);
}
return result;
}
string_desc_t
string_desc_concat (size_t n, string_desc_t string1, ...)
{
if (n == 0)
/* Invalid argument. */
abort ();
size_t total = 0;
total += string1.nbytes;
if (n > 1)
{
va_list other_strings;
size_t i;
va_start (other_strings, string1);
for (i = --n; i > 0; i--)
{
string_desc_t arg = va_arg (other_strings, string_desc_t);
total += arg.nbytes;
}
va_end (other_strings);
}
char *combined = (char *) xmalloc (total);
size_t pos = 0;
memcpy (combined, string1.data, string1.nbytes);
pos += string1.nbytes;
if (n > 1)
{
va_list other_strings;
size_t i;
va_start (other_strings, string1);
for (i = --n; i > 0; i--)
{
string_desc_t arg = va_arg (other_strings, string_desc_t);
if (arg.nbytes > 0)
memcpy (combined + pos, arg.data, arg.nbytes);
pos += arg.nbytes;
}
va_end (other_strings);
}
string_desc_t result;
result.nbytes = total;
result.data = combined;
return result;
}
char *
string_desc_c (string_desc_t s)
{
size_t n = s.nbytes;
char *result = (char *) xmalloc (n + 1);
if (n > 0)
memcpy (result, s.data, n);
result[n] = '\0';
return result;
}
/* ==== Operations with side effects on string descriptors ==== */
void
string_desc_set_char_at (string_desc_t s, size_t i, char c)
{
if (!(i < s.nbytes))
/* Invalid argument. */
abort ();
s.data[i] = c;
}
void
string_desc_fill (string_desc_t s, size_t start, size_t end, char c)
{
if (!(start <= end))
/* Invalid arguments. */
abort ();
if (start < end)
memset (s.data + start, (unsigned char) c, end - start);
}
void
string_desc_overwrite (string_desc_t s, size_t start, string_desc_t t)
{
if (!(start + t.nbytes <= s.nbytes))
/* Invalid arguments. */
abort ();
if (t.nbytes > 0)
memcpy (s.data + start, t.data, t.nbytes);
}
void
string_desc_free (string_desc_t s)
{
free (s.data);
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
@ 2023-03-24 22:32 ` Paul Eggert
2023-03-25 11:39 ` Bruno Haible
2023-03-24 23:20 ` Jeffrey Walton
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2023-03-24 22:32 UTC (permalink / raw)
To: Bruno Haible, bug-gnulib
On 2023-03-24 14:50, Bruno Haible wrote:
> struct
> {
> size_t nbytes;
> char * data;
> }
One minor comment: use idx_t instead of size_t, for the usual reasons.
Also it might be a bit more efficient to put the pointer first.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
2023-03-24 22:32 ` Paul Eggert
@ 2023-03-24 23:20 ` Jeffrey Walton
2023-03-25 6:25 ` Vivien Kraus
2023-03-25 11:49 ` Bruno Haible
2023-03-25 6:21 ` Vivien Kraus
2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
3 siblings, 2 replies; 12+ messages in thread
From: Jeffrey Walton @ 2023-03-24 23:20 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib
On Fri, Mar 24, 2023 at 5:50 PM Bruno Haible <bruno@clisp.org> wrote:
>
> In most application areas, it is not a problem if strings cannot contain NUL
> bytes, and thus the C type 'char *' with its NUL terminator is well usable.
>
> In areas where strings with embedded NUL bytes need to be handled, the common
> approach is to use a 'char * data' pointer together with a 'size_t nbytes'
> size. This works fine in code that constructs or manipulates strings with
> embedded NUL bytes. But when it comes to *storing* them, for example in an
> array or as key or value of a hash table, one needs a type that combines these
> two fields:
>
> struct
> {
> size_t nbytes;
> char * data;
> }
>
> I propose to add a module that adds such a type, together with elementary
> functions that work on them.
>
> Such a type was long known as a "string descriptor" in VMS. It's also known
> as basic_string_view<char> in C++, or as String in Java.
>
> The type that I'm proposing does not have NUL byte appended to the data
> always and automatically, because I think it is more important to have a
> string_desc_substring function that does not cause memory allocation,
> than to have string_desc_c function (conversion to 'char *') that does
> not cause memory allocation.
I would take caution if not including a NULL. A natural thing to want
to do is print a string, and C-based routines usually expect a
terminating NULL.
Also, if you initialize the struct, then the allocated string will
likely include a terminating NULL. I understand the size member will
omit the NULL, but it will be present anyways in the string. (Unless
you do something ugly, like spell out the characters of the string).
> The type that I'm proposing does not have two distinct fields
> nbytes_used and nbytes_allocated. Such a type, e.g. [1] attempts to
> cover the use-case of accumulating a string as well. But
> - The Java experience with String vs. StringBuffer/StringBuilder
> shows that it is cleaner to separate the two use cases.
> - For the use-case of accumulating a string, C programmers have been using
> ad-hoc code with n_used and n_allocated for a long time; there is
> no need for anything else (except for lazy people who want C to be
> a scripting language).
>
> The type that I'm proposing also does not have fields for heap management,
> such as a 'bool heap' [2] or a reference count. That's because I think that
> - managing the allocated memory of a data structure is a different
> problem than that of representing a string, and it can be achieved
> with data outside the string descriptor,
> - Such a field would make it wrong to simply assign a string descriptor
> to a variable.
>
> Please let me know what you think: Does this have a place in Gnulib? (Or
> should it stay in GNU gettext, where I need it for the Perl parser?)
A length prefixed string may be a good idea. It could also help with
safer string handling functions and efficient operations on a string
because length is already available.
So if you are going to add the "string descriptor", then I hope you
add some functions to make it easier for less experienced folks to
write safer code.
> [1] https://github.com/websnarf/bstrlib/blob/master/bstrlib.txt
> [2] https://github.com/maxim2266/str
Also see libbsd's stringlist.h for some inspiration,
https://cgit.freedesktop.org/libbsd/tree/include/bsd/stringlist.h .
Jeff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
2023-03-24 22:32 ` Paul Eggert
2023-03-24 23:20 ` Jeffrey Walton
@ 2023-03-25 6:21 ` Vivien Kraus
2023-03-25 11:56 ` Bruno Haible
2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
3 siblings, 1 reply; 12+ messages in thread
From: Vivien Kraus @ 2023-03-25 6:21 UTC (permalink / raw)
To: Bruno Haible, bug-gnulib
Hello!
I frequently use ad-hoc code for this, however in library code, in
which xmalloc is not much used.
I learn new gnulib things primarily from the manual. Do you plan to
document it there?
Le vendredi 24 mars 2023 à 22:50 +0100, Bruno Haible a écrit :
> /* Return a copy of string S, as a NUL-terminated C string. */
> extern char * string_desc_c (string_desc_t s);
Would it be appropriate to use the attribute module and mark this
ATTRIBUTE_DEALLOC_FREE?
Best regards,
Vivien
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-24 23:20 ` Jeffrey Walton
@ 2023-03-25 6:25 ` Vivien Kraus
2023-03-25 11:49 ` Bruno Haible
1 sibling, 0 replies; 12+ messages in thread
From: Vivien Kraus @ 2023-03-25 6:25 UTC (permalink / raw)
To: noloader, Bruno Haible; +Cc: bug-gnulib
Le vendredi 24 mars 2023 à 19:20 -0400, Jeffrey Walton a écrit :
> The type that I'm proposing does not have NUL byte appended to the
> data
> > always and automatically, because I think it is more important to
> > have a
> > string_desc_substring function that does not cause memory
> > allocation,
> > than to have string_desc_c function (conversion to 'char *') that
> > does
> > not cause memory allocation.
>
> I would take caution if not including a NULL. A natural thing to want
> to do is print a string, and C-based routines usually expect a
> terminating NULL.
>
> Also, if you initialize the struct, then the allocated string will
> likely include a terminating NULL. I understand the size member will
> omit the NULL, but it will be present anyways in the string. (Unless
> you do something ugly, like spell out the characters of the string).
From what I understand, the proposed substring function cannot add a
NUL byte without doing a copy first.
Vivien
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-24 22:32 ` Paul Eggert
@ 2023-03-25 11:39 ` Bruno Haible
0 siblings, 0 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-25 11:39 UTC (permalink / raw)
To: bug-gnulib, Paul Eggert
Paul Eggert wrote:
> > struct
> > {
> > size_t nbytes;
> > char * data;
> > }
>
> One minor comment: use idx_t instead of size_t, for the usual reasons.
Right, done. Thanks for the reminder.
> Also it might be a bit more efficient to put the pointer first.
On some CPUs probably, but not on others. Unless it's a clear win, I prefer
to avoid such code changes. The entire struct fits into a cache line anyway.
Even an attribute _Alignas(2*sizeof(long)) would only help on NetBSD, IIRC,
because for heap-allocated data, 2*sizeof(long) is already the default
alignment on most platforms.
Bruno
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-24 23:20 ` Jeffrey Walton
2023-03-25 6:25 ` Vivien Kraus
@ 2023-03-25 11:49 ` Bruno Haible
2023-03-25 15:51 ` Paul Eggert
1 sibling, 1 reply; 12+ messages in thread
From: Bruno Haible @ 2023-03-25 11:49 UTC (permalink / raw)
To: noloader; +Cc: bug-gnulib
Jeffrey Walton wrote:
> A natural thing to want
> to do is print a string, and C-based routines usually expect a
> terminating NULL.
I'll add a comment regarding printf with the "%.*s" directive.
> Also, if you initialize the struct, then the allocated string will
> likely include a terminating NULL. I understand the size member will
> omit the NULL, but it will be present anyways in the string.
No; it depends where the 'char *' comes from. If it is a pointer into
a piece of memory read through read_file, for example, there will be
no NUL terminator.
Also, in C you can write
char buf[4] = "abcd";
which does not add a NUL.
> A length prefixed string may be a good idea.
https://github.com/antirez/sds does it like this. But again, this
does not allow for an allocation-free substring function.
> So if you are going to add the "string descriptor", then I hope you
> add some functions to make it easier for less experienced folks to
> write safer code.
I believe all these functions are already in the proposal.
> Also see libbsd's stringlist.h for some inspiration,
> https://cgit.freedesktop.org/libbsd/tree/include/bsd/stringlist.h .
This is unrelated, AFAICS. It's not about a string, but about an
extensible array of strings.
Bruno
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-25 6:21 ` Vivien Kraus
@ 2023-03-25 11:56 ` Bruno Haible
0 siblings, 0 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-25 11:56 UTC (permalink / raw)
To: bug-gnulib, Vivien Kraus
Vivien Kraus wrote:
> I frequently use ad-hoc code for this, however in library code, in
> which xmalloc is not much used.
Good point. I'll need to duplicate the interface of the memory
allocating functions: one with 'x', that use xmalloc, and one without
'x', for use in libraries.
> I learn new gnulib things primarily from the manual. Do you plan to
> document it there?
Yes, sure. The reference documentation can stay in the .h file, but
and overview and general usage section belongs in the documentation.
> > /* Return a copy of string S, as a NUL-terminated C string. */
> > extern char * string_desc_c (string_desc_t s);
>
> Would it be appropriate to use the attribute module and mark this
> ATTRIBUTE_DEALLOC_FREE?
Good point, yes. Will do!
Thanks for your review and remarks.
Bruno
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-25 11:49 ` Bruno Haible
@ 2023-03-25 15:51 ` Paul Eggert
2023-03-28 22:40 ` Bruno Haible
0 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2023-03-25 15:51 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib, noloader
On 2023-03-25 04:49, Bruno Haible wrote:
> I'll add a comment regarding printf with the "%.*s" directive.
That works only if the string lacks NULs and its length fits into int,
and one must also convert the idx_t length to int (e.g., via a cast
which I find tricky). Although these limitations could be documented, it
might also be good to have an API like quotearg to generate a quoted or
quotable string that can be printed with plain %s.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
` (2 preceding siblings ...)
2023-03-25 6:21 ` Vivien Kraus
@ 2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
2023-03-28 22:49 ` Bruno Haible
3 siblings, 1 reply; 12+ messages in thread
From: Simon Josefsson via Gnulib discussion list @ 2023-03-27 10:15 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 1132 bytes --]
Bruno Haible <bruno@clisp.org> writes:
> struct
> {
> size_t nbytes;
> char * data;
> }
>
> I propose to add a module that adds such a type, together with elementary
> functions that work on them.
I think this is a useful contribution, however I see two deal-breakers
for having it in gnulib -- both related to use in libraries. I think
string helpers types/functions like this is useful not only in
applications but also in libraries. Thus:
1) License - there really isn't much novelty here, how about making
this public domain or LGPLv2+?
2) Applicability to use in a library - using x*alloc and abort is
frowned upon in libraries. Libraries should return error codes on
expected errors (and I argue memory allocation failure is an expected
error), and not cause application exits.
What do you think?
One way to resolve 2) is to have two variants of this functionality: one
low-level variant that doesn't abort the application on errors, and one
high-level variant that behaves like your implementation. The
high-level variant could depend on the low-level variant, but that's not
essential.
/Simon
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-25 15:51 ` Paul Eggert
@ 2023-03-28 22:40 ` Bruno Haible
0 siblings, 0 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-28 22:40 UTC (permalink / raw)
To: Paul Eggert; +Cc: bug-gnulib, noloader
Paul Eggert wrote:
> > I'll add a comment regarding printf with the "%.*s" directive.
>
> That works only if the string lacks NULs
Ouch, indeed.
> and its length fits into int,
> and one must also convert the idx_t length to int (e.g., via a cast
> which I find tricky).
I've now documented that "%.*s" is NOT the solution.
> Although these limitations could be documented, it
> might also be good to have an API like quotearg to generate a quoted or
> quotable string that can be printed with plain %s.
Good point. I've added wrappers around the quotearg functions. Fortunately,
most of the quotearg functions already have a *_mem variant that was designed
precisely for this case.
Bruno
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RFC: add a string-desc module
2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
@ 2023-03-28 22:49 ` Bruno Haible
0 siblings, 0 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-28 22:49 UTC (permalink / raw)
To: Simon Josefsson; +Cc: bug-gnulib
Simon Josefsson wrote:
> I think this is a useful contribution,
Thanks.
> however I see two deal-breakers
> for having it in gnulib -- both related to use in libraries. I think
> string helpers types/functions like this is useful not only in
> applications but also in libraries. Thus:
>
> 1) License - there really isn't much novelty here, how about making
> this public domain or LGPLv2+?
Not public domain — it does not protect the user from patent claims.
Not MIT license — I don't intend to make gifts to proprietary software
vendors. It's bad enough that some companies ignore the requirements
of the GPL. <https://www.youtube.com/watch?v=5rgsXq2e7Ck>
I've put the core module under LGPLv3+.
If you want it under LGPLv2+, it would be OK for my part, but we would
have to relax the 'memrchr' module to LGPLv2+ first.
> 2) Applicability to use in a library - using x*alloc and abort is
> frowned upon in libraries. Libraries should return error codes on
> expected errors (and I argue memory allocation failure is an expected
> error), and not cause application exits.
Done by separating library-safe memory allocations and checked memory
allocations into separate modules.
> One way to resolve 2) is to have two variants of this functionality: one
> low-level variant that doesn't abort the application on errors, and one
> high-level variant that behaves like your implementation. The
> high-level variant could depend on the low-level variant, but that's not
> essential.
Yes, that's how I did it, for the most part. I couldn't do this so easily
for the string_desc_concat function, though, due to varargs.
Bruno
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-03-28 22:50 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
2023-03-24 22:32 ` Paul Eggert
2023-03-25 11:39 ` Bruno Haible
2023-03-24 23:20 ` Jeffrey Walton
2023-03-25 6:25 ` Vivien Kraus
2023-03-25 11:49 ` Bruno Haible
2023-03-25 15:51 ` Paul Eggert
2023-03-28 22:40 ` Bruno Haible
2023-03-25 6:21 ` Vivien Kraus
2023-03-25 11:56 ` Bruno Haible
2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
2023-03-28 22:49 ` Bruno Haible
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).