From: Jeffrey Walton <noloader@gmail.com>
To: Bruno Haible <bruno@clisp.org>
Cc: bug-gnulib@gnu.org
Subject: Re: RFC: add a string-desc module
Date: Fri, 24 Mar 2023 19:20:19 -0400 [thread overview]
Message-ID: <CAH8yC8mfviFog+wASgUsr7Te59WKxKF=xAeajY=K5OeMk6GRrg@mail.gmail.com> (raw)
In-Reply-To: <9740540.4vTCxPXJkl@nimes>
On Fri, Mar 24, 2023 at 5:50 PM Bruno Haible <bruno@clisp.org> wrote:
>
> In most application areas, it is not a problem if strings cannot contain NUL
> bytes, and thus the C type 'char *' with its NUL terminator is well usable.
>
> In areas where strings with embedded NUL bytes need to be handled, the common
> approach is to use a 'char * data' pointer together with a 'size_t nbytes'
> size. This works fine in code that constructs or manipulates strings with
> embedded NUL bytes. But when it comes to *storing* them, for example in an
> array or as key or value of a hash table, one needs a type that combines these
> two fields:
>
> struct
> {
> size_t nbytes;
> char * data;
> }
>
> I propose to add a module that adds such a type, together with elementary
> functions that work on them.
>
> Such a type was long known as a "string descriptor" in VMS. It's also known
> as basic_string_view<char> in C++, or as String in Java.
>
> The type that I'm proposing does not have NUL byte appended to the data
> always and automatically, because I think it is more important to have a
> string_desc_substring function that does not cause memory allocation,
> than to have string_desc_c function (conversion to 'char *') that does
> not cause memory allocation.
I would take caution if not including a NULL. A natural thing to want
to do is print a string, and C-based routines usually expect a
terminating NULL.
Also, if you initialize the struct, then the allocated string will
likely include a terminating NULL. I understand the size member will
omit the NULL, but it will be present anyways in the string. (Unless
you do something ugly, like spell out the characters of the string).
> The type that I'm proposing does not have two distinct fields
> nbytes_used and nbytes_allocated. Such a type, e.g. [1] attempts to
> cover the use-case of accumulating a string as well. But
> - The Java experience with String vs. StringBuffer/StringBuilder
> shows that it is cleaner to separate the two use cases.
> - For the use-case of accumulating a string, C programmers have been using
> ad-hoc code with n_used and n_allocated for a long time; there is
> no need for anything else (except for lazy people who want C to be
> a scripting language).
>
> The type that I'm proposing also does not have fields for heap management,
> such as a 'bool heap' [2] or a reference count. That's because I think that
> - managing the allocated memory of a data structure is a different
> problem than that of representing a string, and it can be achieved
> with data outside the string descriptor,
> - Such a field would make it wrong to simply assign a string descriptor
> to a variable.
>
> Please let me know what you think: Does this have a place in Gnulib? (Or
> should it stay in GNU gettext, where I need it for the Perl parser?)
A length prefixed string may be a good idea. It could also help with
safer string handling functions and efficient operations on a string
because length is already available.
So if you are going to add the "string descriptor", then I hope you
add some functions to make it easier for less experienced folks to
write safer code.
> [1] https://github.com/websnarf/bstrlib/blob/master/bstrlib.txt
> [2] https://github.com/maxim2266/str
Also see libbsd's stringlist.h for some inspiration,
https://cgit.freedesktop.org/libbsd/tree/include/bsd/stringlist.h .
Jeff
next prev parent reply other threads:[~2023-03-24 23:21 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
2023-03-24 22:32 ` Paul Eggert
2023-03-25 11:39 ` Bruno Haible
2023-03-24 23:20 ` Jeffrey Walton [this message]
2023-03-25 6:25 ` Vivien Kraus
2023-03-25 11:49 ` Bruno Haible
2023-03-25 15:51 ` Paul Eggert
2023-03-28 22:40 ` Bruno Haible
2023-03-25 6:21 ` Vivien Kraus
2023-03-25 11:56 ` Bruno Haible
2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
2023-03-28 22:49 ` Bruno Haible
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAH8yC8mfviFog+wASgUsr7Te59WKxKF=xAeajY=K5OeMk6GRrg@mail.gmail.com' \
--to=noloader@gmail.com \
--cc=bruno@clisp.org \
--cc=bug-gnulib@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).