bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* RFC: add a string-desc module
@ 2023-03-24 21:50 Bruno Haible
  2023-03-24 22:32 ` Paul Eggert
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-24 21:50 UTC (permalink / raw)
  To: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 2397 bytes --]

In most application areas, it is not a problem if strings cannot contain NUL
bytes, and thus the C type 'char *' with its NUL terminator is well usable.

In areas where strings with embedded NUL bytes need to be handled, the common
approach is to use a 'char * data' pointer together with a 'size_t nbytes'
size. This works fine in code that constructs or manipulates strings with
embedded NUL bytes. But when it comes to *storing* them, for example in an
array or as key or value of a hash table, one needs a type that combines these
two fields:

  struct
  {
    size_t nbytes;
    char * data;
  }

I propose to add a module that adds such a type, together with elementary
functions that work on them.

Such a type was long known as a "string descriptor" in VMS. It's also known
as basic_string_view<char> in C++, or as String in Java.

The type that I'm proposing does not have NUL byte appended to the data
always and automatically, because I think it is more important to have a
string_desc_substring function that does not cause memory allocation,
than to have string_desc_c function (conversion to 'char *') that does
not cause memory allocation.

The type that I'm proposing does not have two distinct fields
nbytes_used and nbytes_allocated. Such a type, e.g. [1] attempts to
cover the use-case of accumulating a string as well. But
  - The Java experience with String vs. StringBuffer/StringBuilder
    shows that it is cleaner to separate the two use cases.
  - For the use-case of accumulating a string, C programmers have been using
    ad-hoc code with n_used and n_allocated for a long time; there is
    no need for anything else (except for lazy people who want C to be
    a scripting language).

The type that I'm proposing also does not have fields for heap management,
such as a 'bool heap' [2] or a reference count. That's because I think that
  - managing the allocated memory of a data structure is a different
    problem than that of representing a string, and it can be achieved
    with data outside the string descriptor,
  - Such a field would make it wrong to simply assign a string descriptor
    to a variable.

Please let me know what you think: Does this have a place in Gnulib? (Or
should it stay in GNU gettext, where I need it for the Perl parser?)

Bruno

[1] https://github.com/websnarf/bstrlib/blob/master/bstrlib.txt
[2] https://github.com/maxim2266/str

[-- Attachment #2: string-desc.h --]
[-- Type: text/x-chdr, Size: 4678 bytes --]

/* GNU gettext - internationalization aids
   Copyright (C) 2023 Free Software Foundation, Inc.

   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */

/* Written by Bruno Haible <bruno@clisp.org>, 2023.  */

#ifndef _STRING_DESC_H
#define _STRING_DESC_H 1

/* Get size_t, ptrdiff_t.  */
#include <stddef.h>

/* Get bool.  */
#include <stdbool.h>


#ifdef __cplusplus
extern "C" {
#endif


/* Type describing a string that may contain NUL bytes.
   It's merely a descriptor of an array of bytes.  */
typedef struct string_desc_t string_desc_t;
struct string_desc_t
{
  size_t nbytes;
  char *data;
};

/* String descriptors can be passed and returned by value.  */


/* ==== Side-effect-free operations on string descriptors ==== */

/* Return the length of the string S.  */
extern size_t string_desc_length (string_desc_t s);

/* Return the byte at index I of string S.
   I must be < length(S).  */
extern char string_desc_char_at (string_desc_t s, size_t i);

/* Return a read-only view of the bytes of S.  */
extern const char * string_desc_data (string_desc_t s);

/* Return true if S is the empty string.  */
extern bool string_desc_is_empty (string_desc_t s);

/* Return true if S starts with PREFIX.  */
extern bool string_desc_startswith (string_desc_t s, string_desc_t prefix);

/* Return true if S ends with SUFFIX.  */
extern bool string_desc_endswith (string_desc_t s, string_desc_t suffix);

/* Return > 0, == 0, or < 0 if A > B, A == B, A < B.
   This uses a lexicographic ordering, where the bytes are compared as
   'unsigned char'.  */
extern int string_desc_cmp (string_desc_t a, string_desc_t b);

/* Return the index of the first occurrence of C in S,
   or -1 if there is none.  */
extern ptrdiff_t string_desc_index (string_desc_t s, char c);

/* Return the index of the last occurrence of C in S,
   or -1 if there is none.  */
extern ptrdiff_t string_desc_last_index (string_desc_t s, char c);

/* Return the index of the first occurrence of NEEDLE in HAYSTACK,
   or -1 if there is none.  */
extern ptrdiff_t string_desc_contains (string_desc_t haystack, string_desc_t needle);

/* Return a string that represents the C string S, of length strlen (S).  */
extern string_desc_t string_desc_from_c (const char *s);

/* Return the substring of S, starting at offset START and ending at offset END.
   START must be <= END.
   The result is of length END - START.
   The result must not be freed (since its storage is part of the storage
   of S).  */
extern string_desc_t string_desc_substring (string_desc_t s, size_t start, size_t end);


/* ==== Memory-allocating operations on string descriptors ==== */

/* Return a string of length N, with uninitialized contents.  */
extern string_desc_t string_desc_new (size_t n);

/* Return a string of length N, at the given memory address.  */
extern string_desc_t string_desc_new_addr (size_t n, char *addr);

/* Return a string of length N, filled with C.  */
extern string_desc_t string_desc_new_filled (size_t n, char c);

/* Return a copy of string S.  */
extern string_desc_t string_desc_copy (string_desc_t s);

/* Return the concatenation of N strings.  N must be > 0.  */
extern string_desc_t string_desc_concat (size_t n, string_desc_t string1, ...);

/* Return a copy of string S, as a NUL-terminated C string.  */
extern char * string_desc_c (string_desc_t s);


/* ==== Operations with side effects on string descriptors ==== */

/* Overwrite the byte at index I of string S with C.
   I must be < length(S).  */
extern void string_desc_set_char_at (string_desc_t s, size_t i, char c);

/* Fill part of S, starting at offset START and ending at offset END,
   with copies of C.
   START must be <= END.  */
extern void string_desc_fill (string_desc_t s, size_t start, size_t end, char c);

/* Overwrite part of S with T, starting at offset START.
   START + length(T) must be <= length (S).  */
extern void string_desc_overwrite (string_desc_t s, size_t start, string_desc_t t);

/* Free S.  */
extern void string_desc_free (string_desc_t s);


#ifdef __cplusplus
}
#endif


#endif /* _STRING_DESC_H */

[-- Attachment #3: string-desc.c --]
[-- Type: text/x-csrc, Size: 6596 bytes --]

/* GNU gettext - internationalization aids
   Copyright (C) 2023 Free Software Foundation, Inc.

   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */

/* Written by Bruno Haible <bruno@clisp.org>, 2023.  */

#ifdef HAVE_CONFIG_H
# include "config.h"
#endif

/* Specification.  */
#include "str-desc.h"

#include <stdarg.h>
#include <stdlib.h>
#include <string.h>

#include "xalloc.h"


/* ==== Side-effect-free operations on string descriptors ==== */

size_t
string_desc_length (string_desc_t s)
{
  return s.nbytes;
}

char
string_desc_char_at (string_desc_t s, size_t i)
{
  if (!(i < s.nbytes))
    /* Invalid argument.  */
    abort ();
  return s.data[i];
}

const char *
string_desc_data (string_desc_t s)
{
  return s.data;
}

bool
string_desc_is_empty (string_desc_t s)
{
  return s.nbytes == 0;
}

bool
string_desc_startswith (string_desc_t s, string_desc_t prefix)
{
  return (s.nbytes >= prefix.nbytes
          && (prefix.nbytes == 0
              || memcmp (s.data, prefix.data, prefix.nbytes) == 0));
}

bool
string_desc_endswith (string_desc_t s, string_desc_t suffix)
{
  return (s.nbytes >= suffix.nbytes
          && (suffix.nbytes == 0
              || memcmp (s.data + (s.nbytes - suffix.nbytes), suffix.data,
                         suffix.nbytes) == 0));
}

int
string_desc_cmp (string_desc_t a, string_desc_t b)
{
  if (a.nbytes > b.nbytes)
    {
      if (b.nbytes == 0)
        return 1;
      return (memcmp (a.data, b.data, b.nbytes) < 0 ? -1 : 1);
    }
  else if (a.nbytes < b.nbytes)
    {
      if (a.nbytes == 0)
        return -1;
      return (memcmp (a.data, b.data, a.nbytes) > 0 ? 1 : -1);
    }
  else /* a.nbytes == b.nbytes */
    {
      if (a.nbytes == 0)
        return 0;
      return memcmp (a.data, b.data, a.nbytes);
    }
}

ptrdiff_t
string_desc_index (string_desc_t s, char c)
{
  if (s.nbytes > 0)
    {
      void *found = memchr (s.data, (unsigned char) c, s.nbytes);
      if (found != NULL)
        return (char *) found - s.data;
    }
  return -1;
}

ptrdiff_t
string_desc_last_index (string_desc_t s, char c)
{
  if (s.nbytes > 0)
    {
      void *found = memrchr (s.data, (unsigned char) c, s.nbytes);
      if (found != NULL)
        return (char *) found - s.data;
    }
  return -1;
}

ptrdiff_t
string_desc_contains (string_desc_t haystack, string_desc_t needle)
{
  if (needle.nbytes == 0)
    return 0;
  void *found =
    memmem (haystack.data, haystack.nbytes, needle.data, needle.nbytes);
  if (found != NULL)
    return (char *) found - haystack.data;
  else
    return -1;
}

string_desc_t
string_desc_from_c (const char *s)
{
  string_desc_t result;

  result.nbytes = strlen (s);
  result.data = (char *) s;

  return result;
}

string_desc_t
string_desc_substring (string_desc_t s, size_t start, size_t end)
{
  string_desc_t result;

  if (!(start <= end))
    /* Invalid arguments.  */
    abort ();

  result.nbytes = end - start;
  result.data = s.data + start;

  return result;
}


/* ==== Memory-allocating operations on string descriptors ==== */

string_desc_t
string_desc_new (size_t n)
{
  string_desc_t result;

  result.nbytes = n;
  if (n == 0)
    result.data = NULL;
  else
    result.data = (char *) xmalloc (n);

  return result;
}

string_desc_t
string_desc_new_addr (size_t n, char *addr)
{
  string_desc_t result;

  result.nbytes = n;
  if (n == 0)
    result.data = NULL;
  else
    result.data = addr;

  return result;
}

string_desc_t
string_desc_new_filled (size_t n, char c)
{
  string_desc_t result;

  result.nbytes = n;
  if (n == 0)
    result.data = NULL;
  else
    {
      result.data = (char *) xmalloc (n);
      memset (result.data, (unsigned char) c, n);
    }

  return result;
}

string_desc_t
string_desc_copy (string_desc_t s)
{
  string_desc_t result;
  size_t n = s.nbytes;

  result.nbytes = n;
  if (n == 0)
    result.data = NULL;
  else
    {
      result.data = (char *) xmalloc (n);
      memcpy (result.data, s.data, n);
    }

  return result;
}

string_desc_t
string_desc_concat (size_t n, string_desc_t string1, ...)
{
  if (n == 0)
    /* Invalid argument.  */
    abort ();

  size_t total = 0;
  total += string1.nbytes;
  if (n > 1)
    {
      va_list other_strings;
      size_t i;

      va_start (other_strings, string1);
      for (i = --n; i > 0; i--)
        {
          string_desc_t arg = va_arg (other_strings, string_desc_t);
          total += arg.nbytes;
        }
      va_end (other_strings);
    }

  char *combined = (char *) xmalloc (total);
  size_t pos = 0;
  memcpy (combined, string1.data, string1.nbytes);
  pos += string1.nbytes;
  if (n > 1)
    {
      va_list other_strings;
      size_t i;

      va_start (other_strings, string1);
      for (i = --n; i > 0; i--)
        {
          string_desc_t arg = va_arg (other_strings, string_desc_t);
          if (arg.nbytes > 0)
            memcpy (combined + pos, arg.data, arg.nbytes);
          pos += arg.nbytes;
        }
      va_end (other_strings);
    }

  string_desc_t result;
  result.nbytes = total;
  result.data = combined;

  return result;
}

char *
string_desc_c (string_desc_t s)
{
  size_t n = s.nbytes;
  char *result = (char *) xmalloc (n + 1);
  if (n > 0)
    memcpy (result, s.data, n);
  result[n] = '\0';

  return result;
}


/* ==== Operations with side effects on string descriptors ==== */

void
string_desc_set_char_at (string_desc_t s, size_t i, char c)
{
  if (!(i < s.nbytes))
    /* Invalid argument.  */
    abort ();
  s.data[i] = c;
}

void
string_desc_fill (string_desc_t s, size_t start, size_t end, char c)
{
  if (!(start <= end))
    /* Invalid arguments.  */
    abort ();

  if (start < end)
    memset (s.data + start, (unsigned char) c, end - start);
}

void
string_desc_overwrite (string_desc_t s, size_t start, string_desc_t t)
{
  if (!(start + t.nbytes <= s.nbytes))
    /* Invalid arguments.  */
    abort ();

  if (t.nbytes > 0)
    memcpy (s.data + start, t.data, t.nbytes);
}

void
string_desc_free (string_desc_t s)
{
  free (s.data);
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
@ 2023-03-24 22:32 ` Paul Eggert
  2023-03-25 11:39   ` Bruno Haible
  2023-03-24 23:20 ` Jeffrey Walton
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2023-03-24 22:32 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib

On 2023-03-24 14:50, Bruno Haible wrote:
>    struct
>    {
>      size_t nbytes;
>      char * data;
>    }

One minor comment: use idx_t instead of size_t, for the usual reasons.

Also it might be a bit more efficient to put the pointer first.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
  2023-03-24 22:32 ` Paul Eggert
@ 2023-03-24 23:20 ` Jeffrey Walton
  2023-03-25  6:25   ` Vivien Kraus
  2023-03-25 11:49   ` Bruno Haible
  2023-03-25  6:21 ` Vivien Kraus
  2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
  3 siblings, 2 replies; 12+ messages in thread
From: Jeffrey Walton @ 2023-03-24 23:20 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib

On Fri, Mar 24, 2023 at 5:50 PM Bruno Haible <bruno@clisp.org> wrote:
>
> In most application areas, it is not a problem if strings cannot contain NUL
> bytes, and thus the C type 'char *' with its NUL terminator is well usable.
>
> In areas where strings with embedded NUL bytes need to be handled, the common
> approach is to use a 'char * data' pointer together with a 'size_t nbytes'
> size. This works fine in code that constructs or manipulates strings with
> embedded NUL bytes. But when it comes to *storing* them, for example in an
> array or as key or value of a hash table, one needs a type that combines these
> two fields:
>
>   struct
>   {
>     size_t nbytes;
>     char * data;
>   }
>
> I propose to add a module that adds such a type, together with elementary
> functions that work on them.
>
> Such a type was long known as a "string descriptor" in VMS. It's also known
> as basic_string_view<char> in C++, or as String in Java.
>
> The type that I'm proposing does not have NUL byte appended to the data
> always and automatically, because I think it is more important to have a
> string_desc_substring function that does not cause memory allocation,
> than to have string_desc_c function (conversion to 'char *') that does
> not cause memory allocation.

I would take caution if not including a NULL. A natural thing to want
to do is print a string, and C-based routines usually expect a
terminating NULL.

Also, if you initialize the struct, then the allocated string will
likely include a terminating NULL. I understand the size member will
omit the NULL, but it will be present anyways in the string. (Unless
you do something ugly, like spell out the characters of the string).

> The type that I'm proposing does not have two distinct fields
> nbytes_used and nbytes_allocated. Such a type, e.g. [1] attempts to
> cover the use-case of accumulating a string as well. But
>   - The Java experience with String vs. StringBuffer/StringBuilder
>     shows that it is cleaner to separate the two use cases.
>   - For the use-case of accumulating a string, C programmers have been using
>     ad-hoc code with n_used and n_allocated for a long time; there is
>     no need for anything else (except for lazy people who want C to be
>     a scripting language).
>
> The type that I'm proposing also does not have fields for heap management,
> such as a 'bool heap' [2] or a reference count. That's because I think that
>   - managing the allocated memory of a data structure is a different
>     problem than that of representing a string, and it can be achieved
>     with data outside the string descriptor,
>   - Such a field would make it wrong to simply assign a string descriptor
>     to a variable.
>
> Please let me know what you think: Does this have a place in Gnulib? (Or
> should it stay in GNU gettext, where I need it for the Perl parser?)

A length prefixed string may be a good idea. It could also help with
safer string handling functions and efficient operations on a string
because length is already available.

So if you are going to add the "string descriptor", then I hope you
add some functions to make it easier for less experienced folks to
write safer code.

> [1] https://github.com/websnarf/bstrlib/blob/master/bstrlib.txt
> [2] https://github.com/maxim2266/str

Also see libbsd's stringlist.h for some inspiration,
https://cgit.freedesktop.org/libbsd/tree/include/bsd/stringlist.h .

Jeff


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
  2023-03-24 22:32 ` Paul Eggert
  2023-03-24 23:20 ` Jeffrey Walton
@ 2023-03-25  6:21 ` Vivien Kraus
  2023-03-25 11:56   ` Bruno Haible
  2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
  3 siblings, 1 reply; 12+ messages in thread
From: Vivien Kraus @ 2023-03-25  6:21 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib

Hello!

I frequently use ad-hoc code for this, however in library code, in
which xmalloc is not much used.

I learn new gnulib things primarily from the manual. Do you plan to
document it there?

Le vendredi 24 mars 2023 à 22:50 +0100, Bruno Haible a écrit :
> /* Return a copy of string S, as a NUL-terminated C string.  */
> extern char * string_desc_c (string_desc_t s);

Would it be appropriate to use the attribute module and mark this
ATTRIBUTE_DEALLOC_FREE?

Best regards,

Vivien


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-24 23:20 ` Jeffrey Walton
@ 2023-03-25  6:25   ` Vivien Kraus
  2023-03-25 11:49   ` Bruno Haible
  1 sibling, 0 replies; 12+ messages in thread
From: Vivien Kraus @ 2023-03-25  6:25 UTC (permalink / raw)
  To: noloader, Bruno Haible; +Cc: bug-gnulib

Le vendredi 24 mars 2023 à 19:20 -0400, Jeffrey Walton a écrit :
>  The type that I'm proposing does not have NUL byte appended to the
> data
> > always and automatically, because I think it is more important to
> > have a
> > string_desc_substring function that does not cause memory
> > allocation,
> > than to have string_desc_c function (conversion to 'char *') that
> > does
> > not cause memory allocation.
> 
> I would take caution if not including a NULL. A natural thing to want
> to do is print a string, and C-based routines usually expect a
> terminating NULL.
> 
> Also, if you initialize the struct, then the allocated string will
> likely include a terminating NULL. I understand the size member will
> omit the NULL, but it will be present anyways in the string. (Unless
> you do something ugly, like spell out the characters of the string).

From what I understand, the proposed substring function cannot add a
NUL byte without doing a copy first.

Vivien


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-24 22:32 ` Paul Eggert
@ 2023-03-25 11:39   ` Bruno Haible
  0 siblings, 0 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-25 11:39 UTC (permalink / raw)
  To: bug-gnulib, Paul Eggert

Paul Eggert wrote:
> >    struct
> >    {
> >      size_t nbytes;
> >      char * data;
> >    }
> 
> One minor comment: use idx_t instead of size_t, for the usual reasons.

Right, done. Thanks for the reminder.

> Also it might be a bit more efficient to put the pointer first.

On some CPUs probably, but not on others. Unless it's a clear win, I prefer
to avoid such code changes. The entire struct fits into a cache line anyway.

Even an attribute _Alignas(2*sizeof(long)) would only help on NetBSD, IIRC,
because for heap-allocated data, 2*sizeof(long) is already the default
alignment on most platforms.

Bruno





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-24 23:20 ` Jeffrey Walton
  2023-03-25  6:25   ` Vivien Kraus
@ 2023-03-25 11:49   ` Bruno Haible
  2023-03-25 15:51     ` Paul Eggert
  1 sibling, 1 reply; 12+ messages in thread
From: Bruno Haible @ 2023-03-25 11:49 UTC (permalink / raw)
  To: noloader; +Cc: bug-gnulib

Jeffrey Walton wrote:
> A natural thing to want
> to do is print a string, and C-based routines usually expect a
> terminating NULL.

I'll add a comment regarding printf with the "%.*s" directive.

> Also, if you initialize the struct, then the allocated string will
> likely include a terminating NULL. I understand the size member will
> omit the NULL, but it will be present anyways in the string.

No; it depends where the 'char *' comes from. If it is a pointer into
a piece of memory read through read_file, for example, there will be
no NUL terminator.

Also, in C you can write
  char buf[4] = "abcd";
which does not add a NUL.

> A length prefixed string may be a good idea.

https://github.com/antirez/sds does it like this. But again, this
does not allow for an allocation-free substring function.

> So if you are going to add the "string descriptor", then I hope you
> add some functions to make it easier for less experienced folks to
> write safer code.

I believe all these functions are already in the proposal.

> Also see libbsd's stringlist.h for some inspiration,
> https://cgit.freedesktop.org/libbsd/tree/include/bsd/stringlist.h .

This is unrelated, AFAICS. It's not about a string, but about an
extensible array of strings.

Bruno





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-25  6:21 ` Vivien Kraus
@ 2023-03-25 11:56   ` Bruno Haible
  0 siblings, 0 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-25 11:56 UTC (permalink / raw)
  To: bug-gnulib, Vivien Kraus

Vivien Kraus wrote:
> I frequently use ad-hoc code for this, however in library code, in
> which xmalloc is not much used.

Good point. I'll need to duplicate the interface of the memory
allocating functions: one with 'x', that use xmalloc, and one without
'x', for use in libraries.

> I learn new gnulib things primarily from the manual. Do you plan to
> document it there?

Yes, sure. The reference documentation can stay in the .h file, but
and overview and general usage section belongs in the documentation.

> > /* Return a copy of string S, as a NUL-terminated C string.  */
> > extern char * string_desc_c (string_desc_t s);
> 
> Would it be appropriate to use the attribute module and mark this
> ATTRIBUTE_DEALLOC_FREE?

Good point, yes. Will do!

Thanks for your review and remarks.

Bruno





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-25 11:49   ` Bruno Haible
@ 2023-03-25 15:51     ` Paul Eggert
  2023-03-28 22:40       ` Bruno Haible
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2023-03-25 15:51 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib, noloader

On 2023-03-25 04:49, Bruno Haible wrote:

> I'll add a comment regarding printf with the "%.*s" directive.

That works only if the string lacks NULs and its length fits into int, 
and one must also convert the idx_t length to int (e.g., via a cast 
which I find tricky). Although these limitations could be documented, it 
might also be good to have an API like quotearg to generate a quoted or 
quotable string that can be printed with plain %s.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
                   ` (2 preceding siblings ...)
  2023-03-25  6:21 ` Vivien Kraus
@ 2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
  2023-03-28 22:49   ` Bruno Haible
  3 siblings, 1 reply; 12+ messages in thread
From: Simon Josefsson via Gnulib discussion list @ 2023-03-27 10:15 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 1132 bytes --]

Bruno Haible <bruno@clisp.org> writes:

>   struct
>   {
>     size_t nbytes;
>     char * data;
>   }
>
> I propose to add a module that adds such a type, together with elementary
> functions that work on them.

I think this is a useful contribution, however I see two deal-breakers
for having it in gnulib -- both related to use in libraries.  I think
string helpers types/functions like this is useful not only in
applications but also in libraries.  Thus:

 1) License - there really isn't much novelty here, how about making
 this public domain or LGPLv2+?

 2) Applicability to use in a library - using x*alloc and abort is
 frowned upon in libraries.  Libraries should return error codes on
 expected errors (and I argue memory allocation failure is an expected
 error), and not cause application exits.

What do you think?

One way to resolve 2) is to have two variants of this functionality: one
low-level variant that doesn't abort the application on errors, and one
high-level variant that behaves like your implementation.  The
high-level variant could depend on the low-level variant, but that's not
essential.

/Simon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-25 15:51     ` Paul Eggert
@ 2023-03-28 22:40       ` Bruno Haible
  0 siblings, 0 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-28 22:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: bug-gnulib, noloader

Paul Eggert wrote:
> > I'll add a comment regarding printf with the "%.*s" directive.
> 
> That works only if the string lacks NULs

Ouch, indeed.

> and its length fits into int, 
> and one must also convert the idx_t length to int (e.g., via a cast 
> which I find tricky).

I've now documented that "%.*s" is NOT the solution.

> Although these limitations could be documented, it 
> might also be good to have an API like quotearg to generate a quoted or 
> quotable string that can be printed with plain %s.

Good point. I've added wrappers around the quotearg functions. Fortunately,
most of the quotearg functions already have a *_mem variant that was designed
precisely for this case.

Bruno





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: add a string-desc module
  2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
@ 2023-03-28 22:49   ` Bruno Haible
  0 siblings, 0 replies; 12+ messages in thread
From: Bruno Haible @ 2023-03-28 22:49 UTC (permalink / raw)
  To: Simon Josefsson; +Cc: bug-gnulib

Simon Josefsson wrote:
> I think this is a useful contribution,

Thanks.

> however I see two deal-breakers
> for having it in gnulib -- both related to use in libraries.  I think
> string helpers types/functions like this is useful not only in
> applications but also in libraries.  Thus:
> 
>  1) License - there really isn't much novelty here, how about making
>  this public domain or LGPLv2+?

Not public domain — it does not protect the user from patent claims.

Not MIT license — I don't intend to make gifts to proprietary software
vendors. It's bad enough that some companies ignore the requirements
of the GPL. <https://www.youtube.com/watch?v=5rgsXq2e7Ck>

I've put the core module under LGPLv3+.

If you want it under LGPLv2+, it would be OK for my part, but we would
have to relax the 'memrchr' module to LGPLv2+ first.

>  2) Applicability to use in a library - using x*alloc and abort is
>  frowned upon in libraries.  Libraries should return error codes on
>  expected errors (and I argue memory allocation failure is an expected
>  error), and not cause application exits.

Done by separating library-safe memory allocations and checked memory
allocations into separate modules.

> One way to resolve 2) is to have two variants of this functionality: one
> low-level variant that doesn't abort the application on errors, and one
> high-level variant that behaves like your implementation.  The
> high-level variant could depend on the low-level variant, but that's not
> essential.

Yes, that's how I did it, for the most part. I couldn't do this so easily
for the string_desc_concat function, though, due to varargs.

Bruno





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-03-28 22:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-24 21:50 RFC: add a string-desc module Bruno Haible
2023-03-24 22:32 ` Paul Eggert
2023-03-25 11:39   ` Bruno Haible
2023-03-24 23:20 ` Jeffrey Walton
2023-03-25  6:25   ` Vivien Kraus
2023-03-25 11:49   ` Bruno Haible
2023-03-25 15:51     ` Paul Eggert
2023-03-28 22:40       ` Bruno Haible
2023-03-25  6:21 ` Vivien Kraus
2023-03-25 11:56   ` Bruno Haible
2023-03-27 10:15 ` Simon Josefsson via Gnulib discussion list
2023-03-28 22:49   ` Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).