bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* Re: [PATCH] stat: don't explicitly request file size for filenames
       [not found] <7AC7AA2A-EDEA-4A80-9A5A-02FAA2D823EF@whamcloud.com>
@ 2019-07-04 10:44 ` Pádraig Brady
  2019-07-04 11:24   ` [PATCH] areadlink-with-size: guess a lower bound with 0 size Pádraig Brady
                     ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Pádraig Brady @ 2019-07-04 10:44 UTC (permalink / raw)
  To: Andreas Dilger, coreutils@gnu.org; +Cc: bug-gnulib, Jeff Layton

[-- Attachment #1: Type: text/plain, Size: 2468 bytes --]

On 03/07/19 21:24, Andreas Dilger wrote:
> When calling 'stat -c %N' to print the filename, don't explicitly
> request the size of the file via statx(), as it may add overhead on
> some filesystems.  The size is only needed to optimize an allocation
> for the relatively rare case of reading a symlink name, and the worst
> effect is a somewhat-too-large temporary buffer may be allocated for
> areadlink_with_size(), or internal retries if buffer is too small.
> 
> The file size will be returned by statx() on most filesystems, even
> if not requested, unless the filesystem considers this to be too
> expensive for that file, in which case the tradeoff is worthwhile.
> 
> * src/stat.c: Don't explicitly request STATX_SIZE for filenames.
> Start with a 1KB buffer for areadlink_with_size() if st_size unset.
> ---
>  src/stat.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/stat.c b/src/stat.c
> index ec0bb7d..c887013 100644
> --- a/src/stat.c
> +++ b/src/stat.c
> @@ -1282,7 +1282,7 @@ fmt_to_mask (char fmt)
>    switch (fmt)
>      {
>      case 'N':
> -      return STATX_MODE|STATX_SIZE;
> +      return STATX_MODE;
>      case 'd':
>      case 'D':
>        return STATX_MODE;
> @@ -1491,7 +1491,9 @@ print_stat (char *pformat, size_t prefix_len, unsigned int m,
>        out_string (pformat, prefix_len, quoteN (filename));
>        if (S_ISLNK (statbuf->st_mode))
>          {
> -          char *linkname = areadlink_with_size (filename, statbuf->st_size);
> +          /* if statx() didn't set size, most symlinks are under 1KB */
> +          char *linkname = areadlink_with_size (filename, statbuf->st_size ?:
> +                                                1023);

It would be nice to have areadlink_with_size treat 0 as auto select some lower bound.
There is already logic there, and it would be generally helpful,
as st_size can often be 0, as shown with:

  $ strace -e readlink stat -c %N /proc/$$/cwd
  readlink("/proc/9036/cwd", "/", 1)      = 1
  readlink("/proc/9036/cwd", "/h", 2)     = 2
  readlink("/proc/9036/cwd", "/hom", 4)   = 4
  readlink("/proc/9036/cwd", "/home/pa", 8) = 8
  readlink("/proc/9036/cwd", "/home/padraig", 16) = 13

With the attached gnulib diff, we get:

  $ strace -e readlink git/coreutils/src/stat -c %N /proc/$$/cwd
  readlink("/proc/12512/cwd", "/home/padraig", 1024) = 13

I'll push both later.

thanks!
Pádraig

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: areadlink-zero-size.diff --]
[-- Type: text/x-patch; name="areadlink-zero-size.diff", Size: 1368 bytes --]

diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
index eacad3f..2fbe51c 100644
--- a/lib/areadlink-with-size.c
+++ b/lib/areadlink-with-size.c
@@ -36,14 +36,15 @@
    check, so it's OK to guess too small on hosts where there is no
    arbitrary limit to symbolic link length.  */
 #ifndef SYMLINK_MAX
-# define SYMLINK_MAX 1024
+# define SYMLINK_MAX 1023
 #endif
 
 #define MAXSIZE (SIZE_MAX < SSIZE_MAX ? SIZE_MAX : SSIZE_MAX)
 
 /* Call readlink to get the symbolic link value of FILE.
    SIZE is a hint as to how long the link is expected to be;
-   typically it is taken from st_size.  It need not be correct.
+   typically it is taken from st_size.  It need not be correct,
+   and a value of 0 (or more than 8Ki) will select an appropriate lower bound.
    Return a pointer to that NUL-terminated string in malloc'd storage.
    If readlink fails, malloc fails, or if the link value is longer
    than SSIZE_MAX, return NULL (caller may use errno to diagnose).  */
@@ -61,7 +62,7 @@ areadlink_with_size (char const *file, size_t size)
                           : INITIAL_LIMIT_BOUND);
 
   /* The initial buffer size for the link value.  */
-  size_t buf_size = size < initial_limit ? size + 1 : initial_limit;
+  size_t buf_size = size && size < initial_limit ? size + 1 : initial_limit;
 
   while (1)
     {

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] areadlink-with-size: guess a lower bound with 0 size
  2019-07-04 10:44 ` [PATCH] stat: don't explicitly request file size for filenames Pádraig Brady
@ 2019-07-04 11:24   ` Pádraig Brady
  2019-07-05 15:43     ` Jim Meyering
  2019-07-05 22:59     ` Bruno Haible
  2019-07-04 19:39   ` [PATCH] stat: don't explicitly request file size for filenames Andreas Dilger
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Pádraig Brady @ 2019-07-04 11:24 UTC (permalink / raw)
  To: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 122 bytes --]

BTW this would allocate more for empty symlinks,
but they're rare: https://lwn.net/Articles/551224/

cheers,
Pádraig

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: areadlink-zero-size.patch --]
[-- Type: text/x-patch; name="areadlink-zero-size.patch", Size: 3048 bytes --]

From 6be031b4c9d6cb742a010dbe3fe38f77fe515fec Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P@draigBrady.com>
Date: Thu, 4 Jul 2019 11:50:16 +0100
Subject: [PATCH] areadlink-with-size: guess a lower bound with 0 size

* lib/areadlink-with-size.c (areadlink_with_size): The size
is usually taken from st_size, which can be zero, resulting
in inefficient operation as seen with:

  $ strace -e readlink stat -c %N /proc/$$/cwd
  readlink("/proc/9036/cwd", "/", 1)      = 1
  readlink("/proc/9036/cwd", "/h", 2)     = 2
  readlink("/proc/9036/cwd", "/hom", 4)   = 4
  readlink("/proc/9036/cwd", "/home/pa", 8) = 8
  readlink("/proc/9036/cwd", "/home/padraig", 16) = 13

Instead let zero select an appropriate lower bound,
as was already done for sizes more than 8Ki.
We also change SYMLINK_MAX to 1023 so that the initial
allocation is a power of two.
---
 ChangeLog                 | 10 ++++++++++
 lib/areadlink-with-size.c |  7 ++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index ea2e86a..bbd91f0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2019-07-04  Pádraig Brady  <P@draigBrady.com>
+
+	areadlink-with-size: guess a lower bound with 0 size
+	* lib/areadlink-with-size.c (areadlink_with_size):
+	SIZE is usually taken from st_size, which can be zero.
+	Instead let zero select an appropriate lower bound,
+	as was already done for sizes more than 8Ki.
+	We also change SYMLINK_MAX to 1023 so that the initial
+	allocation is a power of two.
+
 2019-07-03  Bruno Haible  <bruno@clisp.org>
 
 	mbrtowc: Fix invalid use of mbtowc() on MSVC.
diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
index eacad3f..2fbe51c 100644
--- a/lib/areadlink-with-size.c
+++ b/lib/areadlink-with-size.c
@@ -36,14 +36,15 @@
    check, so it's OK to guess too small on hosts where there is no
    arbitrary limit to symbolic link length.  */
 #ifndef SYMLINK_MAX
-# define SYMLINK_MAX 1024
+# define SYMLINK_MAX 1023
 #endif
 
 #define MAXSIZE (SIZE_MAX < SSIZE_MAX ? SIZE_MAX : SSIZE_MAX)
 
 /* Call readlink to get the symbolic link value of FILE.
    SIZE is a hint as to how long the link is expected to be;
-   typically it is taken from st_size.  It need not be correct.
+   typically it is taken from st_size.  It need not be correct,
+   and a value of 0 (or more than 8Ki) will select an appropriate lower bound.
    Return a pointer to that NUL-terminated string in malloc'd storage.
    If readlink fails, malloc fails, or if the link value is longer
    than SSIZE_MAX, return NULL (caller may use errno to diagnose).  */
@@ -61,7 +62,7 @@ areadlink_with_size (char const *file, size_t size)
                           : INITIAL_LIMIT_BOUND);
 
   /* The initial buffer size for the link value.  */
-  size_t buf_size = size < initial_limit ? size + 1 : initial_limit;
+  size_t buf_size = size && size < initial_limit ? size + 1 : initial_limit;
 
   while (1)
     {
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] stat: don't explicitly request file size for filenames
  2019-07-04 10:44 ` [PATCH] stat: don't explicitly request file size for filenames Pádraig Brady
  2019-07-04 11:24   ` [PATCH] areadlink-with-size: guess a lower bound with 0 size Pádraig Brady
@ 2019-07-04 19:39   ` Andreas Dilger
  2019-07-04 23:19   ` Andreas Dilger
  2019-07-05 23:53   ` Paul Eggert
  3 siblings, 0 replies; 13+ messages in thread
From: Andreas Dilger @ 2019-07-04 19:39 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: bug-gnulib, Jeff Layton, coreutils@gnu.org

On Jul 4, 2019, at 04:44, Pádraig Brady <P@draigBrady.com> wrote:
> 
> On 03/07/19 21:24, Andreas Dilger wrote:
>> When calling 'stat -c %N' to print the filename, don't explicitly
>> request the size of the file via statx(), as it may add overhead on
>> some filesystems.  The size is only needed to optimize an allocation
>> for the relatively rare case of reading a symlink name, and the worst
>> effect is a somewhat-too-large temporary buffer may be allocated for
>> areadlink_with_size(), or internal retries if buffer is too small.
>> 
>> The file size will be returned by statx() on most filesystems, even
>> if not requested, unless the filesystem considers this to be too
>> expensive for that file, in which case the tradeoff is worthwhile.
>> 
>> * src/stat.c: Don't explicitly request STATX_SIZE for filenames.
>> Start with a 1KB buffer for areadlink_with_size() if st_size unset.
>> ---
>> src/stat.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/src/stat.c b/src/stat.c
>> index ec0bb7d..c887013 100644
>> --- a/src/stat.c
>> +++ b/src/stat.c
>> @@ -1282,7 +1282,7 @@ fmt_to_mask (char fmt)
>>   switch (fmt)
>>     {
>>     case 'N':
>> -      return STATX_MODE|STATX_SIZE;
>> +      return STATX_MODE;
>>     case 'd':
>>     case 'D':
>>       return STATX_MODE;
>> @@ -1491,7 +1491,9 @@ print_stat (char *pformat, size_t prefix_len, unsigned int m,
>>       out_string (pformat, prefix_len, quoteN (filename));
>>       if (S_ISLNK (statbuf->st_mode))
>>         {
>> -          char *linkname = areadlink_with_size (filename, statbuf->st_size);
>> +          /* if statx() didn't set size, most symlinks are under 1KB */
>> +          char *linkname = areadlink_with_size (filename, statbuf->st_size ?:
>> +                                                1023);
> 
> It would be nice to have areadlink_with_size treat 0 as auto select some lower bound.
> There is already logic there, and it would be generally helpful,
> as st_size can often be 0, as shown with:

I had originally added code to handle this case by allocating a large buffer, but I
removed it to keep the patch minimal.  The benefit of passing the 1024-byte size from
the caller is that we know this is a temporary buffer and over-allocating 1KB for the
symlink is not kept in memory a long time.

That said, it doesn't make any sense to be allocating a 1-byte buffer.  I don't know
the glibc malloc routines, but in the kernel, allocating anything less than 32 bytes
is meaningless, since the minimum slab size is 32 bytes.

>  $ strace -e readlink stat -c %N /proc/$$/cwd
>  readlink("/proc/9036/cwd", "/", 1)      = 1
>  readlink("/proc/9036/cwd", "/h", 2)     = 2
>  readlink("/proc/9036/cwd", "/hom", 4)   = 4
>  readlink("/proc/9036/cwd", "/home/pa", 8) = 8
>  readlink("/proc/9036/cwd", "/home/padraig", 16) = 13
> 
> With the attached gnulib diff, we get:
> 
>  $ strace -e readlink git/coreutils/src/stat -c %N /proc/$$/cwd
>  readlink("/proc/12512/cwd", "/home/padraig", 1024) = 13
> 
> I'll push both later.
> 
> thanks!
> Pádraig
> <areadlink-zero-size.diff>

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud







^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] stat: don't explicitly request file size for filenames
  2019-07-04 10:44 ` [PATCH] stat: don't explicitly request file size for filenames Pádraig Brady
  2019-07-04 11:24   ` [PATCH] areadlink-with-size: guess a lower bound with 0 size Pádraig Brady
  2019-07-04 19:39   ` [PATCH] stat: don't explicitly request file size for filenames Andreas Dilger
@ 2019-07-04 23:19   ` Andreas Dilger
  2019-07-05  1:30     ` Bruno Haible
  2019-07-05  2:36     ` Bruno Haible
  2019-07-05 23:53   ` Paul Eggert
  3 siblings, 2 replies; 13+ messages in thread
From: Andreas Dilger @ 2019-07-04 23:19 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: bug-gnulib, Jeff Layton, coreutils@gnu.org


[-- Attachment #1.1: Type: text/plain, Size: 2430 bytes --]

On Jul 4, 2019, at 04:44, Pádraig Brady <P@draigBrady.com> wrote:
>
> diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
> index eacad3f..2fbe51c 100644
> --- a/lib/areadlink-with-size.c
> +++ b/lib/areadlink-with-size.c
> @@ -36,14 +36,15 @@
>     check, so it's OK to guess too small on hosts where there is no
>     arbitrary limit to symbolic link length.  */
>  #ifndef SYMLINK_MAX
> -# define SYMLINK_MAX 1024
> +# define SYMLINK_MAX 1023
>  #endif
>
>  #define MAXSIZE (SIZE_MAX < SSIZE_MAX ? SIZE_MAX : SSIZE_MAX)
>
>  /* Call readlink to get the symbolic link value of FILE.
>     SIZE is a hint as to how long the link is expected to be;
> -   typically it is taken from st_size.  It need not be correct.
> +   typically it is taken from st_size.  It need not be correct,
> +   and a value of 0 (or more than 8Ki) will select an appropriate lower bound.
>     Return a pointer to that NUL-terminated string in malloc'd storage.
>     If readlink fails, malloc fails, or if the link value is longer
>     than SSIZE_MAX, return NULL (caller may use errno to diagnose).  */
> @@ -61,7 +62,7 @@ areadlink_with_size (char const *file, size_t size)
>                            : INITIAL_LIMIT_BOUND);
>
>    /* The initial buffer size for the link value.  */
> -  size_t buf_size = size < initial_limit ? size + 1 : initial_limit;
> +  size_t buf_size = size && size < initial_limit ? size + 1 : initial_limit;
>
>    while (1)

I'd thought of something similar with the 1KB limit, but then I was worried if
"ls" or something is saving all of the symlink targets in memory that using a
too-large buffer size would cause excess memory to be allocated for a long time.
The above logic (AFAICS) is intended to limit the *maximum* allocation size if
"size" is a bogus value, but I don't think initial_limit is a good lower bound.

I picked the lower bound of the malloc() (32 bytes = 4 * sizeof(void *)) as
documented in https://sourceware.org/glibc/wiki/MallocInternals#What_is_a_Chunk.3F
My version of the patch (merged with your comments) using this size is attached.

My stats show over 99% of symlinks are under 128 bytes, which isn't _too_ much
to allocate for every file.  If we want this code to very efficient (minimize
readlink() calls) at the cost of some RAM, it would be enough to use 127 bytes
as the starting point.

Cheers, Andreas








[-- Attachment #1.2: Type: text/html, Size: 3695 bytes --]

[-- Attachment #2: 0001-areadlink_with_size-handle-size-0-better.patch --]
[-- Type: application/octet-stream, Size: 2620 bytes --]

From 9955c9456df52505b657b09b9caca1588b36f4ba Mon Sep 17 00:00:00 2001
From: Andreas Dilger <adilger@whamcloud.com>
Date: Thu, 4 Jul 2019 15:28:15 -0600
Subject: [PATCH] areadlink_with_size: handle size == 0 better

If size is unset/zero when passed to areadlink_with_size(),
the buffer allocation and while loop start out with buf_size = 1
and in most cases symlink length is considerably larger so there
are several pointless loops while the buffer size is grown:

    $ strace -e readlink stat -c %N /proc/$$/cwd
     readlink("/proc/9036/cwd", "/", 1)      = 1
     readlink("/proc/9036/cwd", "/h", 2)     = 2
     readlink("/proc/9036/cwd", "/hom", 4)   = 4
     readlink("/proc/9036/cwd", "/home/pa", 8) = 8
     readlink("/proc/9036/cwd", "/home/padraig", 16) = 13

Also, there is no point in allocating a buffer under 32 bytes.

* src/areadlink-with-size.c: initialize buf_size=32 if size=0
---
 lib/areadlink-with-size.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
index eacad3f..b49e05c 100644
--- a/lib/areadlink-with-size.c
+++ b/lib/areadlink-with-size.c
@@ -36,14 +36,15 @@
    check, so it's OK to guess too small on hosts where there is no
    arbitrary limit to symbolic link length.  */
 #ifndef SYMLINK_MAX
-# define SYMLINK_MAX 1024
+# define SYMLINK_MAX 1023
 #endif
 
 #define MAXSIZE (SIZE_MAX < SSIZE_MAX ? SIZE_MAX : SSIZE_MAX)
 
 /* Call readlink to get the symbolic link value of FILE.
    SIZE is a hint as to how long the link is expected to be;
-   typically it is taken from st_size.  It need not be correct.
+   typically it is taken from st_size.  It need not be correct, and a
+   value of 0 (or more than 8Ki) will select an appropriate lower bound.
    Return a pointer to that NUL-terminated string in malloc'd storage.
    If readlink fails, malloc fails, or if the link value is longer
    than SSIZE_MAX, return NULL (caller may use errno to diagnose).  */
@@ -60,8 +61,10 @@ areadlink_with_size (char const *file, size_t size)
                           ? symlink_max + 1
                           : INITIAL_LIMIT_BOUND);
 
-  /* The initial buffer size for the link value.  */
-  size_t buf_size = size < initial_limit ? size + 1 : initial_limit;
+  /* The initial buffer size for the link value. If size is unset, there is
+     no benefit to malloc() a buffer under 32 bytes, and it avoids several
+     iterations of the while loop compared to starting with size == 1.  */
+  size_t buf_size = size < initial_limit ? (size ?: 31) + 1 : initial_limit;
 
   while (1)
     {
-- 
2.4.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] stat: don't explicitly request file size for filenames
  2019-07-04 23:19   ` Andreas Dilger
@ 2019-07-05  1:30     ` Bruno Haible
  2019-07-05  2:36     ` Bruno Haible
  1 sibling, 0 replies; 13+ messages in thread
From: Bruno Haible @ 2019-07-05  1:30 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Jeff Layton, Andreas Dilger, coreutils@gnu.org

Hi,

Andreas Dilger wrote:
> +  size_t buf_size = size < initial_limit ? (size ?: 31) + 1 : initial_limit;

This syntax is specific to GCC [1]. In patches for gnulib, please stick to
portable C99 syntax.

Thanks.

Bruno

[1] https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Conditionals.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] stat: don't explicitly request file size for filenames
  2019-07-04 23:19   ` Andreas Dilger
  2019-07-05  1:30     ` Bruno Haible
@ 2019-07-05  2:36     ` Bruno Haible
  2019-07-05 22:56       ` Bruno Haible
  1 sibling, 1 reply; 13+ messages in thread
From: Bruno Haible @ 2019-07-05  2:36 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Jeff Layton, Andreas Dilger, coreutils@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1410 bytes --]

Andreas Dilger wrote:
> I was worried if
> "ls" or something is saving all of the symlink targets in memory that using a
> too-large buffer size would cause excess memory to be allocated for a long time.

Good point. careadlinkat.c has a "Shrink BUF before returning it." logic,
but not all similar functions in gnulib do.

As the attached test program shows, some platforms (glibc, Cygwin) have a
special optimization for a shrinking-realloc of the last malloc()ed block.
Other platforms don't have this and a shrinking-realloc moves the data
if there is a significant gain in space by doing so.

Therefore I think it would be good to add a "Shrink BUF before returning it."
logic also to
  areadlink-with-size.c
  areadlinkat-with-size.c
  etc.

Something like this:

diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
index eacad3f..364cc08 100644
--- a/lib/areadlink-with-size.c
+++ b/lib/areadlink-with-size.c
@@ -87,6 +87,13 @@ areadlink_with_size (char const *file, size_t size)
       if (link_length < buf_size)
         {
           buffer[link_length] = 0;
+          /* Shrink BUFFER before returning it.  */
+          if (link_length + 1 < buf_size)
+            {
+              char *shrinked_buffer = realloc (buffer, link_length + 1);
+              if (shrinked_buffer != NULL)
+                buffer = shrinked_buffer;
+            }
           return buffer;
         }
 

[-- Attachment #2: foo.c --]
[-- Type: text/x-csrc, Size: 13502 bytes --]

#include <stdlib.h>
#include <stdio.h>

int main ()
{
  size_t sizes[] = {
     1, 2, 3, 4, 5, 6,
     7, 8, 9, 11,
     15, 16, 17, 23,
     31, 32, 33, 45,
     63, 64, 65, 91,
     127, 128, 129, 181,
     255, 256, 257, 362,
     511, 512, 513, 724,
     1023, 1024, 1025, 1448,
     2047, 2048, 2049, 2896,
     4095, 4096, 4097, 5793,
     8191, 8192, 8193, 11585,
     16383, 16384, 16385, 23170,
     32767, 32768, 32769, 46341,
     65535, 65536, 65537
  };
  int i;
  for (i = 0; i < sizeof (sizes)/sizeof(sizes[0]); i++) {
    size_t size = sizes[i];
    int j;
    for (j = size; j > 0; j--)
      {
        char *block = malloc (size);
        char *new_block = realloc (block, j);
        if (new_block == NULL)
          abort ();
        free (new_block);
        if (new_block != block)
          break;
      }
    printf ("initial size = %u: moved in memory for size <= %u\n", (unsigned int) size, (unsigned int) j);
  }
}

/*
glibc 2.23, Cygwin:
initial size = 1: moved in memory for size <= 0
initial size = 2: moved in memory for size <= 0
initial size = 3: moved in memory for size <= 0
initial size = 4: moved in memory for size <= 0
initial size = 5: moved in memory for size <= 0
initial size = 6: moved in memory for size <= 0
initial size = 7: moved in memory for size <= 0
initial size = 8: moved in memory for size <= 0
initial size = 9: moved in memory for size <= 0
initial size = 11: moved in memory for size <= 0
initial size = 15: moved in memory for size <= 0
initial size = 16: moved in memory for size <= 0
initial size = 17: moved in memory for size <= 0
initial size = 23: moved in memory for size <= 0
initial size = 31: moved in memory for size <= 0
initial size = 32: moved in memory for size <= 0
initial size = 33: moved in memory for size <= 0
initial size = 45: moved in memory for size <= 0
initial size = 63: moved in memory for size <= 0
initial size = 64: moved in memory for size <= 0
initial size = 65: moved in memory for size <= 0
initial size = 91: moved in memory for size <= 0
initial size = 127: moved in memory for size <= 0
initial size = 128: moved in memory for size <= 0
initial size = 129: moved in memory for size <= 0
initial size = 181: moved in memory for size <= 0
initial size = 255: moved in memory for size <= 0
initial size = 256: moved in memory for size <= 0
initial size = 257: moved in memory for size <= 0
initial size = 362: moved in memory for size <= 0
initial size = 511: moved in memory for size <= 0
initial size = 512: moved in memory for size <= 0
initial size = 513: moved in memory for size <= 0
initial size = 724: moved in memory for size <= 0
initial size = 1023: moved in memory for size <= 0
initial size = 1024: moved in memory for size <= 0
initial size = 1025: moved in memory for size <= 0
initial size = 1448: moved in memory for size <= 0
initial size = 2047: moved in memory for size <= 0
initial size = 2048: moved in memory for size <= 0
initial size = 2049: moved in memory for size <= 0
initial size = 2896: moved in memory for size <= 0
initial size = 4095: moved in memory for size <= 0
initial size = 4096: moved in memory for size <= 0
initial size = 4097: moved in memory for size <= 0
initial size = 5793: moved in memory for size <= 0
initial size = 8191: moved in memory for size <= 0
initial size = 8192: moved in memory for size <= 0
initial size = 8193: moved in memory for size <= 0
initial size = 11585: moved in memory for size <= 0
initial size = 16383: moved in memory for size <= 0
initial size = 16384: moved in memory for size <= 0
initial size = 16385: moved in memory for size <= 0
initial size = 23170: moved in memory for size <= 0
initial size = 32767: moved in memory for size <= 0
initial size = 32768: moved in memory for size <= 0
initial size = 32769: moved in memory for size <= 0
initial size = 46341: moved in memory for size <= 0
initial size = 65535: moved in memory for size <= 0
initial size = 65536: moved in memory for size <= 0
initial size = 65537: moved in memory for size <= 0

FreeBSD 12:
initial size = 1: moved in memory for size <= 0
initial size = 2: moved in memory for size <= 0
initial size = 3: moved in memory for size <= 0
initial size = 4: moved in memory for size <= 0
initial size = 5: moved in memory for size <= 0
initial size = 6: moved in memory for size <= 0
initial size = 7: moved in memory for size <= 0
initial size = 8: moved in memory for size <= 0
initial size = 9: moved in memory for size <= 8
initial size = 11: moved in memory for size <= 8
initial size = 15: moved in memory for size <= 8
initial size = 16: moved in memory for size <= 8
initial size = 17: moved in memory for size <= 16
initial size = 23: moved in memory for size <= 16
initial size = 31: moved in memory for size <= 16
initial size = 32: moved in memory for size <= 16
initial size = 33: moved in memory for size <= 32
initial size = 45: moved in memory for size <= 32
initial size = 63: moved in memory for size <= 48
initial size = 64: moved in memory for size <= 48
initial size = 65: moved in memory for size <= 64
initial size = 91: moved in memory for size <= 80
initial size = 127: moved in memory for size <= 112
initial size = 128: moved in memory for size <= 112
initial size = 129: moved in memory for size <= 128
initial size = 181: moved in memory for size <= 160
initial size = 255: moved in memory for size <= 224
initial size = 256: moved in memory for size <= 224
initial size = 257: moved in memory for size <= 256
initial size = 362: moved in memory for size <= 320
initial size = 511: moved in memory for size <= 448
initial size = 512: moved in memory for size <= 448
initial size = 513: moved in memory for size <= 512
initial size = 724: moved in memory for size <= 640
initial size = 1023: moved in memory for size <= 896
initial size = 1024: moved in memory for size <= 896
initial size = 1025: moved in memory for size <= 1024
initial size = 1448: moved in memory for size <= 1280
initial size = 2047: moved in memory for size <= 1792
initial size = 2048: moved in memory for size <= 1792
initial size = 2049: moved in memory for size <= 2048
initial size = 2896: moved in memory for size <= 2560
initial size = 4095: moved in memory for size <= 3584
initial size = 4096: moved in memory for size <= 3584
initial size = 4097: moved in memory for size <= 4096
initial size = 5793: moved in memory for size <= 5120
initial size = 8191: moved in memory for size <= 7168
initial size = 8192: moved in memory for size <= 7168
initial size = 8193: moved in memory for size <= 8192
initial size = 11585: moved in memory for size <= 10240
initial size = 16383: moved in memory for size <= 14336
initial size = 16384: moved in memory for size <= 14336
initial size = 16385: moved in memory for size <= 14336
initial size = 23170: moved in memory for size <= 14336
initial size = 32767: moved in memory for size <= 14336
initial size = 32768: moved in memory for size <= 14336
initial size = 32769: moved in memory for size <= 14336
initial size = 46341: moved in memory for size <= 14336
initial size = 65535: moved in memory for size <= 14336
initial size = 65536: moved in memory for size <= 14336
initial size = 65537: moved in memory for size <= 14336

Solaris 10:
initial size = 1: moved in memory for size <= 0
initial size = 2: moved in memory for size <= 0
initial size = 3: moved in memory for size <= 0
initial size = 4: moved in memory for size <= 0
initial size = 5: moved in memory for size <= 0
initial size = 6: moved in memory for size <= 0
initial size = 7: moved in memory for size <= 0
initial size = 8: moved in memory for size <= 0
initial size = 9: moved in memory for size <= 8
initial size = 11: moved in memory for size <= 8
initial size = 15: moved in memory for size <= 8
initial size = 16: moved in memory for size <= 8
initial size = 17: moved in memory for size <= 16
initial size = 23: moved in memory for size <= 16
initial size = 31: moved in memory for size <= 24
initial size = 32: moved in memory for size <= 24
initial size = 33: moved in memory for size <= 32
initial size = 45: moved in memory for size <= 32
initial size = 63: moved in memory for size <= 32
initial size = 64: moved in memory for size <= 32
initial size = 65: moved in memory for size <= 32
initial size = 91: moved in memory for size <= 32
initial size = 127: moved in memory for size <= 32
initial size = 128: moved in memory for size <= 32
initial size = 129: moved in memory for size <= 32
initial size = 181: moved in memory for size <= 32
initial size = 255: moved in memory for size <= 32
initial size = 256: moved in memory for size <= 32
initial size = 257: moved in memory for size <= 32
initial size = 362: moved in memory for size <= 32
initial size = 511: moved in memory for size <= 32
initial size = 512: moved in memory for size <= 32
initial size = 513: moved in memory for size <= 32
initial size = 724: moved in memory for size <= 32
initial size = 1023: moved in memory for size <= 32
initial size = 1024: moved in memory for size <= 32
initial size = 1025: moved in memory for size <= 32
initial size = 1448: moved in memory for size <= 32
initial size = 2047: moved in memory for size <= 32
initial size = 2048: moved in memory for size <= 32
initial size = 2049: moved in memory for size <= 32
initial size = 2896: moved in memory for size <= 32
initial size = 4095: moved in memory for size <= 32
initial size = 4096: moved in memory for size <= 32
initial size = 4097: moved in memory for size <= 32
initial size = 5793: moved in memory for size <= 32
initial size = 8191: moved in memory for size <= 32
initial size = 8192: moved in memory for size <= 32
initial size = 8193: moved in memory for size <= 32
initial size = 11585: moved in memory for size <= 32
initial size = 16383: moved in memory for size <= 32
initial size = 16384: moved in memory for size <= 32
initial size = 16385: moved in memory for size <= 32
initial size = 23170: moved in memory for size <= 32
initial size = 32767: moved in memory for size <= 32
initial size = 32768: moved in memory for size <= 32
initial size = 32769: moved in memory for size <= 32
initial size = 46341: moved in memory for size <= 32
initial size = 65535: moved in memory for size <= 32
initial size = 65536: moved in memory for size <= 32
initial size = 65537: moved in memory for size <= 32

MSVC:
initial size = 1: moved in memory for size <= 0
initial size = 2: moved in memory for size <= 0
initial size = 3: moved in memory for size <= 0
initial size = 4: moved in memory for size <= 2
initial size = 5: moved in memory for size <= 5
initial size = 6: moved in memory for size <= 6
initial size = 7: moved in memory for size <= 7
initial size = 8: moved in memory for size <= 8
initial size = 9: moved in memory for size <= 9
initial size = 11: moved in memory for size <= 11
initial size = 15: moved in memory for size <= 15
initial size = 16: moved in memory for size <= 16
initial size = 17: moved in memory for size <= 17
initial size = 23: moved in memory for size <= 23
initial size = 31: moved in memory for size <= 18
initial size = 32: moved in memory for size <= 32
initial size = 33: moved in memory for size <= 0
initial size = 45: moved in memory for size <= 33
initial size = 63: moved in memory for size <= 55
initial size = 64: moved in memory for size <= 64
initial size = 65: moved in memory for size <= 58
initial size = 91: moved in memory for size <= 76
initial size = 127: moved in memory for size <= 122
initial size = 128: moved in memory for size <= 128
initial size = 129: moved in memory for size <= 127
initial size = 181: moved in memory for size <= 178
initial size = 255: moved in memory for size <= 252
initial size = 256: moved in memory for size <= 256
initial size = 257: moved in memory for size <= 254
initial size = 362: moved in memory for size <= 359
initial size = 511: moved in memory for size <= 506
initial size = 512: moved in memory for size <= 512
initial size = 513: moved in memory for size <= 509
initial size = 724: moved in memory for size <= 721
initial size = 1023: moved in memory for size <= 1020
initial size = 1024: moved in memory for size <= 1024
initial size = 1025: moved in memory for size <= 1022
initial size = 1448: moved in memory for size <= 1445
initial size = 2047: moved in memory for size <= 2043
initial size = 2048: moved in memory for size <= 2048
initial size = 2049: moved in memory for size <= 2046
initial size = 2896: moved in memory for size <= 2893
initial size = 4095: moved in memory for size <= 4091
initial size = 4096: moved in memory for size <= 4096
initial size = 4097: moved in memory for size <= 4094
initial size = 5793: moved in memory for size <= 5790
initial size = 8191: moved in memory for size <= 8188
initial size = 8192: moved in memory for size <= 8192
initial size = 8193: moved in memory for size <= 8190
initial size = 11585: moved in memory for size <= 11582
initial size = 16383: moved in memory for size <= 16380
initial size = 16384: moved in memory for size <= 16384
initial size = 16385: moved in memory for size <= 0
initial size = 23170: moved in memory for size <= 0
initial size = 32767: moved in memory for size <= 0
initial size = 32768: moved in memory for size <= 0
initial size = 32769: moved in memory for size <= 0
initial size = 46341: moved in memory for size <= 0
initial size = 65535: moved in memory for size <= 0
initial size = 65536: moved in memory for size <= 0
initial size = 65537: moved in memory for size <= 0

*/

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] areadlink-with-size: guess a lower bound with 0 size
  2019-07-04 11:24   ` [PATCH] areadlink-with-size: guess a lower bound with 0 size Pádraig Brady
@ 2019-07-05 15:43     ` Jim Meyering
  2019-07-05 22:59     ` Bruno Haible
  1 sibling, 0 replies; 13+ messages in thread
From: Jim Meyering @ 2019-07-05 15:43 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: bug-gnulib

On Thu, Jul 4, 2019 at 4:24 AM Pádraig Brady <P@draigbrady.com> wrote:
> BTW this would allocate more for empty symlinks,
> but they're rare: https://lwn.net/Articles/551224/

Interesting (re)reading. Patch looks fine. Thanks.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] stat: don't explicitly request file size for filenames
  2019-07-05  2:36     ` Bruno Haible
@ 2019-07-05 22:56       ` Bruno Haible
  0 siblings, 0 replies; 13+ messages in thread
From: Bruno Haible @ 2019-07-05 22:56 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Jeff Layton, Andreas Dilger, coreutils@gnu.org

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

> Therefore I think it would be good to add a "Shrink BUF before returning it."
> logic also to
>   areadlink-with-size.c
>   areadlinkat-with-size.c
>   etc.

Done as follows:
- For areadlink-with-size, areadlinkat-with-size, xgethostname, xgetdomainname
  by shrinking the result before returning it.
- In getcwd-lgpl and getcwd, shrinking is already done. But it may call
  realloc (P, SIZE) when the block allocated is already of size SIZE. It
  ought to be a no-op, but actually isn't on all platforms (on MSVC when
  SIZE is one of 8, 16, 32, ..., 16384).
- I'm not touching getdelim.c, because the calling convention of this function
  is made for calling getdelim() in a loop, and the buffer is meant to
  occasionally grow, not constantly grow and shrink and grow shrink ...



[-- Attachment #2: 0001-areadlink-with-size-Don-t-return-an-excessive-memory.patch --]
[-- Type: text/x-patch, Size: 1611 bytes --]

From eb76defb19b3cc01e12d77d8cc96d402b9b5097d Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Sat, 6 Jul 2019 00:39:07 +0200
Subject: [PATCH 1/5] areadlink-with-size: Don't return an excessive memory
 allocation.

Reported by Andreas Dilger <adilger@whamcloud.com>.

* lib/areadlink-with-size.c (areadlink_with_size): Shrink the buffer
before returning it.
---
 ChangeLog                 | 7 +++++++
 lib/areadlink-with-size.c | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index d6209ef..c0ee6ce 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2019-07-05  Bruno Haible  <bruno@clisp.org>
+
+	areadlink-with-size: Don't return an excessive memory allocation.
+	Reported by Andreas Dilger <adilger@whamcloud.com>.
+	* lib/areadlink-with-size.c (areadlink_with_size): Shrink the buffer
+	before returning it.
+
 2019-07-03  Bruno Haible  <bruno@clisp.org>
 
 	renameatu: Fix test failure on MSVC.
diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
index eacad3f..364cc08 100644
--- a/lib/areadlink-with-size.c
+++ b/lib/areadlink-with-size.c
@@ -87,6 +87,13 @@ areadlink_with_size (char const *file, size_t size)
       if (link_length < buf_size)
         {
           buffer[link_length] = 0;
+          /* Shrink BUFFER before returning it.  */
+          if (link_length + 1 < buf_size)
+            {
+              char *shrinked_buffer = realloc (buffer, link_length + 1);
+              if (shrinked_buffer != NULL)
+                buffer = shrinked_buffer;
+            }
           return buffer;
         }
 
-- 
2.7.4


[-- Attachment #3: 0002-areadlinkat-with-size-Don-t-return-an-excessive-memo.patch --]
[-- Type: text/x-patch, Size: 1691 bytes --]

From 8f91b5cf3514d664289afaace0d6f832b9608f20 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Sat, 6 Jul 2019 00:40:34 +0200
Subject: [PATCH 2/5] areadlinkat-with-size: Don't return an excessive memory
 allocation.

* lib/areadlinkat-with-size.c (areadlinkat_with_size): Shrink the buffer
before returning it.
---
 ChangeLog                   | 6 ++++++
 lib/areadlinkat-with-size.c | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index c0ee6ce..d091eb1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
 2019-07-05  Bruno Haible  <bruno@clisp.org>
 
+	areadlinkat-with-size: Don't return an excessive memory allocation.
+	* lib/areadlinkat-with-size.c (areadlinkat_with_size): Shrink the buffer
+	before returning it.
+
+2019-07-05  Bruno Haible  <bruno@clisp.org>
+
 	areadlink-with-size: Don't return an excessive memory allocation.
 	Reported by Andreas Dilger <adilger@whamcloud.com>.
 	* lib/areadlink-with-size.c (areadlink_with_size): Shrink the buffer
diff --git a/lib/areadlinkat-with-size.c b/lib/areadlinkat-with-size.c
index ed00b98..5b2bccc 100644
--- a/lib/areadlinkat-with-size.c
+++ b/lib/areadlinkat-with-size.c
@@ -92,6 +92,13 @@ areadlinkat_with_size (int fd, char const *file, size_t size)
       if (link_length < buf_size)
         {
           buffer[link_length] = 0;
+          /* Shrink BUFFER before returning it.  */
+          if (link_length + 1 < buf_size)
+            {
+              char *shrinked_buffer = realloc (buffer, link_length + 1);
+              if (shrinked_buffer != NULL)
+                buffer = shrinked_buffer;
+            }
           return buffer;
         }
 
-- 
2.7.4


[-- Attachment #4: 0003-xgethostname-Don-t-return-an-excessive-memory-alloca.patch --]
[-- Type: text/x-patch, Size: 1479 bytes --]

From 98802d166215602a88988ae566259edf4f67efe8 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Sat, 6 Jul 2019 00:43:01 +0200
Subject: [PATCH 3/5] xgethostname: Don't return an excessive memory
 allocation.

* lib/xgethostname.c (xgethostname): Shrink the hostname buffer before
returning it.
---
 ChangeLog          |  6 ++++++
 lib/xgethostname.c | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index d091eb1..c8dd18a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
 2019-07-05  Bruno Haible  <bruno@clisp.org>
 
+	xgethostname: Don't return an excessive memory allocation.
+	* lib/xgethostname.c (xgethostname): Shrink the hostname buffer before
+	returning it.
+
+2019-07-05  Bruno Haible  <bruno@clisp.org>
+
 	areadlinkat-with-size: Don't return an excessive memory allocation.
 	* lib/areadlinkat-with-size.c (areadlinkat_with_size): Shrink the buffer
 	before returning it.
diff --git a/lib/xgethostname.c b/lib/xgethostname.c
index eba34b8..4bcb00f 100644
--- a/lib/xgethostname.c
+++ b/lib/xgethostname.c
@@ -70,5 +70,16 @@ xgethostname (void)
         }
     }
 
+  /* Shrink HOSTNAME before returning it.  */
+  {
+    size_t actual_size = strlen (hostname) + 1;
+    if (actual_size < size)
+      {
+        char *shrinked_hostname = realloc (hostname, actual_size);
+        if (shrinked_hostname != NULL)
+          hostname = shrinked_hostname;
+      }
+  }
+
   return hostname;
 }
-- 
2.7.4


[-- Attachment #5: 0004-xgetdomainname-Don-t-return-an-excessive-memory-allo.patch --]
[-- Type: text/x-patch, Size: 1545 bytes --]

From ac7390ac0884d155d3adc9d0f992413adcaaff87 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Sat, 6 Jul 2019 00:43:59 +0200
Subject: [PATCH 4/5] xgetdomainname: Don't return an excessive memory
 allocation.

* lib/xgetdomainname.c (xgetdomainname): Shrink the domainname buffer
before returning it.
---
 ChangeLog            |  6 ++++++
 lib/xgetdomainname.c | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index c8dd18a..1f5f18b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
 2019-07-05  Bruno Haible  <bruno@clisp.org>
 
+	xgetdomainname: Don't return an excessive memory allocation.
+	* lib/xgetdomainname.c (xgetdomainname): Shrink the domainname buffer
+	before returning it.
+
+2019-07-05  Bruno Haible  <bruno@clisp.org>
+
 	xgethostname: Don't return an excessive memory allocation.
 	* lib/xgethostname.c (xgethostname): Shrink the hostname buffer before
 	returning it.
diff --git a/lib/xgetdomainname.c b/lib/xgetdomainname.c
index 2f59008..65df990 100644
--- a/lib/xgetdomainname.c
+++ b/lib/xgetdomainname.c
@@ -73,5 +73,16 @@ xgetdomainname (void)
       domainname = xrealloc (domainname, size);
     }
 
+  /* Shrink DOMAINNAME before returning it.  */
+  {
+    size_t actual_size = strlen (domainname) + 1;
+    if (actual_size < size)
+      {
+        char *shrinked_domainname = realloc (domainname, actual_size);
+        if (shrinked_domainname != NULL)
+          domainname = shrinked_domainname;
+      }
+  }
+
   return domainname;
 }
-- 
2.7.4


[-- Attachment #6: 0005-getcwd-lgpl-getcwd-Don-t-call-realloc-when-it-is-poi.patch --]
[-- Type: text/x-patch, Size: 2256 bytes --]

From d44241344e8455c59ea04440706cfaf72862c404 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Sat, 6 Jul 2019 00:47:12 +0200
Subject: [PATCH 5/5] getcwd-lgpl, getcwd: Don't call realloc when it is
 pointless.

* lib/getcwd-lgpl.c (rpl_getcwd): Don't call realloc if the result's
needed size is equal to the allocated size.
* lib/getcwd.c (__getcwd): Likewise.
---
 ChangeLog         |  7 +++++++
 lib/getcwd-lgpl.c | 13 +++++++++----
 lib/getcwd.c      |  2 +-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 1f5f18b..98d5531 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,12 @@
 2019-07-05  Bruno Haible  <bruno@clisp.org>
 
+	getcwd-lgpl, getcwd: Don't call realloc when it is pointless.
+	* lib/getcwd-lgpl.c (rpl_getcwd): Don't call realloc if the result's
+	needed size is equal to the allocated size.
+	* lib/getcwd.c (__getcwd): Likewise.
+
+2019-07-05  Bruno Haible  <bruno@clisp.org>
+
 	xgetdomainname: Don't return an excessive memory allocation.
 	* lib/xgetdomainname.c (xgetdomainname): Shrink the domainname buffer
 	before returning it.
diff --git a/lib/getcwd-lgpl.c b/lib/getcwd-lgpl.c
index b224cfc..1eaac78 100644
--- a/lib/getcwd-lgpl.c
+++ b/lib/getcwd-lgpl.c
@@ -115,10 +115,15 @@ rpl_getcwd (char *buf, size_t size)
     }
   else
     {
-      /* Trim to fit, if possible.  */
-      result = realloc (buf, strlen (buf) + 1);
-      if (!result)
-        result = buf;
+      /* Here result == buf.  */
+      /* Shrink result before returning it.  */
+      size_t actual_size = strlen (result) + 1;
+      if (actual_size < size)
+        {
+          char *shrinked_result = realloc (result, actual_size);
+          if (shrinked_result != NULL)
+            result = shrinked_result;
+        }
     }
   return result;
 }
diff --git a/lib/getcwd.c b/lib/getcwd.c
index 8f15f56..bf87809 100644
--- a/lib/getcwd.c
+++ b/lib/getcwd.c
@@ -443,7 +443,7 @@ __getcwd (char *buf, size_t size)
 
   if (size == 0)
     /* Ensure that the buffer is only as large as necessary.  */
-    buf = realloc (dir, used);
+    buf = (used < allocated ? realloc (dir, used) : dir);
 
   if (buf == NULL)
     /* Either buf was NULL all along, or 'realloc' failed but
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] areadlink-with-size: guess a lower bound with 0 size
  2019-07-04 11:24   ` [PATCH] areadlink-with-size: guess a lower bound with 0 size Pádraig Brady
  2019-07-05 15:43     ` Jim Meyering
@ 2019-07-05 22:59     ` Bruno Haible
  1 sibling, 0 replies; 13+ messages in thread
From: Bruno Haible @ 2019-07-05 22:59 UTC (permalink / raw)
  To: bug-gnulib

Pádraig Brady wrote:
> +	areadlink-with-size: guess a lower bound with 0 size

The patch looks good to me. Except that now, with the shrink-before-return in
place, the term "lower bound" is no longer appropriate. A better term is
probably "Initial memory allocation".

Bruno




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] stat: don't explicitly request file size for filenames
  2019-07-04 10:44 ` [PATCH] stat: don't explicitly request file size for filenames Pádraig Brady
                     ` (2 preceding siblings ...)
  2019-07-04 23:19   ` Andreas Dilger
@ 2019-07-05 23:53   ` Paul Eggert
  2019-07-06 19:10     ` [PATCH] areadlink-with-size: guess a buffer size with 0 size Pádraig Brady
  3 siblings, 1 reply; 13+ messages in thread
From: Paul Eggert @ 2019-07-05 23:53 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: Jeff Layton, bug-gnulib, Andreas Dilger, coreutils@gnu.org

[-- Attachment #1: Type: text/plain, Size: 408 bytes --]

Pádraig Brady wrote:
> It would be nice to have areadlink_with_size treat 0 as auto select some lower bound.

Yes, that sounds good. However, I didn't see why that would entail changing 
SYMLINK_MAX from 1024 to 1023, or why the patch would affect the documented API.

How about the attached patch instead? When the guessed size is zero it typically 
avoids a realloc by using a small stack buffer.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-areadlink-with-size-improve-efficiency.patch --]
[-- Type: text/x-patch; name="0001-areadlink-with-size-improve-efficiency.patch", Size: 3643 bytes --]

From d94bf537d7fdda13f4432bb60a98a8bd19d8e18d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 5 Jul 2019 16:48:22 -0700
Subject: [PATCH] areadlink-with-size: improve efficiency
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If the guessed size is 0, guess 200 first, to avoid a sequence of
small readlinks in the usual case.  Reallocate at the end to the
actual size, to avoid memory waste.  Based on a suggestion by
Pádraig Brady.
* lib/areadlink-with-size.c (areadlink_with_size):
Use a small stack buffer when the stated size is zero.
* modules/areadlink-with-size (Depends-on): Add strdup-posix.
---
 ChangeLog                   | 11 +++++++++++
 lib/areadlink-with-size.c   | 26 ++++++++++++++++++++------
 modules/areadlink-with-size |  1 +
 3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 98d5531b1..b07e26335 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2019-07-05  Paul Eggert  <eggert@cs.ucla.edu>
+
+	areadlink-with-size: improve efficiency
+	If the guessed size is 0, guess 200 first, to avoid a sequence of
+	small readlinks in the usual case.  Reallocate at the end to the
+	actual size, to avoid memory waste.  Based on a suggestion by
+	Pádraig Brady.
+	* lib/areadlink-with-size.c (areadlink_with_size):
+	Use a small stack buffer when the stated size is zero.
+	* modules/areadlink-with-size (Depends-on): Add strdup-posix.
+
 2019-07-05  Bruno Haible  <bruno@clisp.org>
 
 	getcwd-lgpl, getcwd: Don't call realloc when it is pointless.
diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
index 364cc0858..9f1f7959f 100644
--- a/lib/areadlink-with-size.c
+++ b/lib/areadlink-with-size.c
@@ -60,18 +60,30 @@ areadlink_with_size (char const *file, size_t size)
                           ? symlink_max + 1
                           : INITIAL_LIMIT_BOUND);
 
+  /* Size of stack buffer for initial readlink when the link size hint
+     is zero.  */
+  enum { stackbuf_size = 200 };
+
   /* The initial buffer size for the link value.  */
-  size_t buf_size = size < initial_limit ? size + 1 : initial_limit;
+  size_t buf_size = (size == 0 ? stackbuf_size
+                     : size < initial_limit ? size + 1 : initial_limit);
 
   while (1)
     {
       ssize_t r;
       size_t link_length;
-      char *buffer = malloc (buf_size);
+      char stackbuf[stackbuf_size];
+      char *buf = stackbuf;
+      char *buffer = NULL;
+
+      if (! (size == 0 && buf_size == stackbuf_size))
+        {
+          buf = buffer = malloc (buf_size);
+          if (!buffer)
+            return NULL;
+        }
 
-      if (buffer == NULL)
-        return NULL;
-      r = readlink (file, buffer, buf_size);
+      r = readlink (file, buf, buf_size);
       link_length = r;
 
       /* On AIX 5L v5.3 and HP-UX 11i v2 04/09, readlink returns -1
@@ -86,7 +98,9 @@ areadlink_with_size (char const *file, size_t size)
 
       if (link_length < buf_size)
         {
-          buffer[link_length] = 0;
+          buf[link_length] = 0;
+          if (!buffer)
+            return strdup (buf);
           /* Shrink BUFFER before returning it.  */
           if (link_length + 1 < buf_size)
             {
diff --git a/modules/areadlink-with-size b/modules/areadlink-with-size
index 82a902187..a4ab0e178 100644
--- a/modules/areadlink-with-size
+++ b/modules/areadlink-with-size
@@ -9,6 +9,7 @@ Depends-on:
 readlink
 ssize_t
 stdint
+strdup-posix
 unistd
 
 configure.ac:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] areadlink-with-size: guess a buffer size with 0 size
  2019-07-05 23:53   ` Paul Eggert
@ 2019-07-06 19:10     ` Pádraig Brady
  2019-07-07  2:28       ` Paul Eggert
  0 siblings, 1 reply; 13+ messages in thread
From: Pádraig Brady @ 2019-07-06 19:10 UTC (permalink / raw)
  To: Paul Eggert; +Cc: bug-gnulib, Andreas Dilger

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On 06/07/19 00:53, Paul Eggert wrote:
> Pádraig Brady wrote:
>> It would be nice to have areadlink_with_size treat 0 as auto select some lower bound.
> 
> Yes, that sounds good. However, I didn't see why that would entail changing 
> SYMLINK_MAX from 1024 to 1023, or why the patch would affect the documented API.
> 
> How about the attached patch instead? When the guessed size is zero it typically 
> avoids a realloc by using a small stack buffer.

Your patch has the advantage of allocating the exact right sized buffer
in the usual case, but the disadvantage of CPU overhead in string length determination,
and some extra code complexity in the separate small buffer handling.

Given Bruno's interim patch of shrinking to the exact sized buffer,
I've push the attached simpler patch that uses a starting buffer
of size 128 (suggested by Andreas), when SIZE==0 is specified.

cheers,
Pádraig

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: areadlink-zero-size.patch --]
[-- Type: text/x-patch; name="areadlink-zero-size.patch", Size: 2967 bytes --]

From 0ccc444f3d2dc3ad1b4d682f7d8403633942ed39 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P@draigBrady.com>
Date: Sat, 6 Jul 2019 19:43:11 +0100
Subject: [PATCH] areadlink-with-size: guess a buffer size with 0 size

The size is usually taken from st_size, which can be zero,
resulting in inefficient operation as seen with:

  $ strace -e readlink stat -c %N /proc/$$/cwd
  readlink("/proc/9036/cwd", "/", 1)      = 1
  readlink("/proc/9036/cwd", "/h", 2)     = 2
  readlink("/proc/9036/cwd", "/hom", 4)   = 4
  readlink("/proc/9036/cwd", "/home/pa", 8) = 8
  readlink("/proc/9036/cwd", "/home/padraig", 16) = 13

Instead let zero select an initial memory allocation
of 128 bytes, which most symlinks fit within.

* lib/areadlink-with-size.c (areadlink_with_size):
Start with a 128 byte buffer, for SIZE == 0.
* lib/areadlinkat-with-size.c (areadlinkat_with_size): Likewise.
---
 ChangeLog                   | 11 +++++++++++
 lib/areadlink-with-size.c   |  3 ++-
 lib/areadlinkat-with-size.c |  3 ++-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 885907d..0b06131 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2019-07-06  Pádraig Brady  <P@draigBrady.com>
+
+	areadlink-with-size: guess a buffer size with 0 size
+	The size is usually taken from st_size, which can be zero,
+	resulting in inefficient operation.
+	Instead let zero select an initial memory allocation
+	of 128 bytes, which most symlinks fit within.
+	* lib/areadlink-with-size.c (areadlink_with_size):
+	Start with a 128 byte buffer, for SIZE == 0.
+	* lib/areadlinkat-with-size.c (areadlinkat_with_size): Likewise.
+
 2019-07-06  Konstantin Kharlamov  <Hi-Angel@yandex.ru>
 
 	Replace manually crafted hex regexes with [:xdigit:]
diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
index 364cc08..b9cd05c 100644
--- a/lib/areadlink-with-size.c
+++ b/lib/areadlink-with-size.c
@@ -61,7 +61,8 @@ areadlink_with_size (char const *file, size_t size)
                           : INITIAL_LIMIT_BOUND);
 
   /* The initial buffer size for the link value.  */
-  size_t buf_size = size < initial_limit ? size + 1 : initial_limit;
+  size_t buf_size = (size == 0 ? 128
+                     : size < initial_limit ? size + 1 : initial_limit);
 
   while (1)
     {
diff --git a/lib/areadlinkat-with-size.c b/lib/areadlinkat-with-size.c
index 5b2bccc..d39096f 100644
--- a/lib/areadlinkat-with-size.c
+++ b/lib/areadlinkat-with-size.c
@@ -66,7 +66,8 @@ areadlinkat_with_size (int fd, char const *file, size_t size)
                           : INITIAL_LIMIT_BOUND);
 
   /* The initial buffer size for the link value.  */
-  size_t buf_size = size < initial_limit ? size + 1 : initial_limit;
+  size_t buf_size = (size == 0 ? 128
+                     : size < initial_limit ? size + 1 : initial_limit);
 
   while (1)
     {
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] areadlink-with-size: guess a buffer size with 0 size
  2019-07-06 19:10     ` [PATCH] areadlink-with-size: guess a buffer size with 0 size Pádraig Brady
@ 2019-07-07  2:28       ` Paul Eggert
  2019-07-19  3:24         ` Bruno Haible
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Eggert @ 2019-07-07  2:28 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: Gnulib bugs, Andreas Dilger

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]

Pádraig Brady wrote:
> Your patch has the advantage of allocating the exact right sized buffer
> in the usual case, but the disadvantage of CPU overhead in string length determination,
> and some extra code complexity in the separate small buffer handling.

I think the code complexity is worth it, to avoid calling realloc in the typical 
case when the size is not known. The string length determination can easily be 
avoided, so I installed the attached which does that.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-areadlink-with-size-avoid-realloc-when-size-0.patch --]
[-- Type: text/x-patch; name="0001-areadlink-with-size-avoid-realloc-when-size-0.patch", Size: 5618 bytes --]

From b979980a653ce610fe5f64935adedda425da3ca4 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 6 Jul 2019 19:25:24 -0700
Subject: [PATCH] areadlink-with-size: avoid realloc when size==0

* lib/areadlink-with-size.c (areadlink_with_size):
* lib/areadlinkat-with-size.c (areadlinkat_with_size):
Reallocate at the end to the actual size, to avoid memory waste,
as suggested by Bruno Haible.  But when the guessed size is zero -
useful when the size is unknown - do the initial small readlink
into the stack, to avoid that realloc in the usual case.
---
 ChangeLog                   | 10 ++++++++++
 lib/areadlink-with-size.c   | 31 +++++++++++++++++++++++--------
 lib/areadlinkat-with-size.c | 31 +++++++++++++++++++++++--------
 3 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 0b061317c..f7e031d9b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2019-07-06  Paul Eggert  <eggert@cs.ucla.edu>
+
+	areadlink-with-size: avoid realloc when size==0
+	* lib/areadlink-with-size.c (areadlink_with_size):
+	* lib/areadlinkat-with-size.c (areadlinkat_with_size):
+	Reallocate at the end to the actual size, to avoid memory waste,
+	as suggested by Bruno Haible.  But when the guessed size is zero -
+	useful when the size is unknown - do the initial small readlink
+	into the stack, to avoid that realloc in the usual case.
+
 2019-07-06  Pádraig Brady  <P@draigBrady.com>
 
 	areadlink-with-size: guess a buffer size with 0 size
diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
index b9cd05cba..ae3c782a4 100644
--- a/lib/areadlink-with-size.c
+++ b/lib/areadlink-with-size.c
@@ -60,19 +60,28 @@ areadlink_with_size (char const *file, size_t size)
                           ? symlink_max + 1
                           : INITIAL_LIMIT_BOUND);
 
+  enum { stackbuf_size = 128 };
+
   /* The initial buffer size for the link value.  */
-  size_t buf_size = (size == 0 ? 128
+  size_t buf_size = (size == 0 ? stackbuf_size
                      : size < initial_limit ? size + 1 : initial_limit);
 
   while (1)
     {
       ssize_t r;
       size_t link_length;
-      char *buffer = malloc (buf_size);
+      char stackbuf[stackbuf_size];
+      char *buf = stackbuf;
+      char *buffer = NULL;
+
+      if (! (size == 0 && buf_size == stackbuf_size))
+        {
+          buf = buffer = malloc (buf_size);
+          if (!buffer)
+            return NULL;
+        }
 
-      if (buffer == NULL)
-        return NULL;
-      r = readlink (file, buffer, buf_size);
+      r = readlink (file, buf, buf_size);
       link_length = r;
 
       /* On AIX 5L v5.3 and HP-UX 11i v2 04/09, readlink returns -1
@@ -87,10 +96,16 @@ areadlink_with_size (char const *file, size_t size)
 
       if (link_length < buf_size)
         {
-          buffer[link_length] = 0;
-          /* Shrink BUFFER before returning it.  */
-          if (link_length + 1 < buf_size)
+          buf[link_length] = 0;
+          if (!buffer)
+            {
+              buffer = malloc (link_length + 1);
+              if (buffer)
+                return memcpy (buffer, buf, link_length + 1);
+            }
+          else if (link_length + 1 < buf_size)
             {
+              /* Shrink BUFFER before returning it.  */
               char *shrinked_buffer = realloc (buffer, link_length + 1);
               if (shrinked_buffer != NULL)
                 buffer = shrinked_buffer;
diff --git a/lib/areadlinkat-with-size.c b/lib/areadlinkat-with-size.c
index d39096f2b..ed46d5911 100644
--- a/lib/areadlinkat-with-size.c
+++ b/lib/areadlinkat-with-size.c
@@ -65,19 +65,28 @@ areadlinkat_with_size (int fd, char const *file, size_t size)
                           ? symlink_max + 1
                           : INITIAL_LIMIT_BOUND);
 
+  enum { stackbuf_size = 128 };
+
   /* The initial buffer size for the link value.  */
-  size_t buf_size = (size == 0 ? 128
+  size_t buf_size = (size == 0 ? stackbuf_size
                      : size < initial_limit ? size + 1 : initial_limit);
 
   while (1)
     {
       ssize_t r;
       size_t link_length;
-      char *buffer = malloc (buf_size);
+      char stackbuf[stackbuf_size];
+      char *buf = stackbuf;
+      char *buffer = NULL;
+
+      if (! (size == 0 && buf_size == stackbuf_size))
+        {
+          buf = buffer = malloc (buf_size);
+          if (!buffer)
+            return NULL;
+        }
 
-      if (buffer == NULL)
-        return NULL;
-      r = readlinkat (fd, file, buffer, buf_size);
+      r = readlinkat (fd, file, buf, buf_size);
       link_length = r;
 
       /* On AIX 5L v5.3 and HP-UX 11i v2 04/09, readlink returns -1
@@ -92,10 +101,16 @@ areadlinkat_with_size (int fd, char const *file, size_t size)
 
       if (link_length < buf_size)
         {
-          buffer[link_length] = 0;
-          /* Shrink BUFFER before returning it.  */
-          if (link_length + 1 < buf_size)
+          buf[link_length] = 0;
+          if (!buffer)
+            {
+              buffer = malloc (link_length + 1);
+              if (buffer)
+                return memcpy (buffer, buf, link_length + 1);
+            }
+          else if (link_length + 1 < buf_size)
             {
+              /* Shrink BUFFER before returning it.  */
               char *shrinked_buffer = realloc (buffer, link_length + 1);
               if (shrinked_buffer != NULL)
                 buffer = shrinked_buffer;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] areadlink-with-size: guess a buffer size with 0 size
  2019-07-07  2:28       ` Paul Eggert
@ 2019-07-19  3:24         ` Bruno Haible
  0 siblings, 0 replies; 13+ messages in thread
From: Bruno Haible @ 2019-07-19  3:24 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Andreas Dilger, Paul Eggert

Paul Eggert wrote on 2019-07-06:
> The string length determination can easily be 
> avoided, so I installed the attached which does that.

This patch produces GCC warnings about the use of memcpy().
And mine as well, about the use of strlen().

This patch fixes it:


2019-07-19  Bruno Haible  <bruno@clisp.org>

	areadlink-with-size, xgethostname, xgetdomainname: Fix GCC warning.
	* lib/areadlink-with-size.c: Include <string.h>.
	* lib/areadlinkat-with-size.c: Likewise.
	* lib/xgethostname.c: Likewise.
	* lib/xgetdomainname.c: Likewise.

diff --git a/lib/areadlink-with-size.c b/lib/areadlink-with-size.c
index ae3c782..95a28b3 100644
--- a/lib/areadlink-with-size.c
+++ b/lib/areadlink-with-size.c
@@ -26,6 +26,7 @@
 #include <limits.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <string.h>
 #include <unistd.h>
 
 #ifndef SSIZE_MAX
diff --git a/lib/areadlinkat-with-size.c b/lib/areadlinkat-with-size.c
index ed46d59..22d39a9 100644
--- a/lib/areadlinkat-with-size.c
+++ b/lib/areadlinkat-with-size.c
@@ -27,6 +27,7 @@
 #include <limits.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <string.h>
 #include <unistd.h>
 
 #if HAVE_READLINKAT
diff --git a/lib/xgetdomainname.c b/lib/xgetdomainname.c
index 65df990..a1accf6 100644
--- a/lib/xgetdomainname.c
+++ b/lib/xgetdomainname.c
@@ -28,6 +28,9 @@
 /* Get errno.  */
 #include <errno.h>
 
+/* Get strlen.  */
+#include <string.h>
+
 /* Get free.  */
 #include <stdlib.h>
 
diff --git a/lib/xgethostname.c b/lib/xgethostname.c
index 4bcb00f..5b3154a 100644
--- a/lib/xgethostname.c
+++ b/lib/xgethostname.c
@@ -25,6 +25,7 @@
 
 #include <stdlib.h>
 #include <errno.h>
+#include <string.h>
 #include <unistd.h>
 
 #include "xalloc.h"



^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-07-19  3:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <7AC7AA2A-EDEA-4A80-9A5A-02FAA2D823EF@whamcloud.com>
2019-07-04 10:44 ` [PATCH] stat: don't explicitly request file size for filenames Pádraig Brady
2019-07-04 11:24   ` [PATCH] areadlink-with-size: guess a lower bound with 0 size Pádraig Brady
2019-07-05 15:43     ` Jim Meyering
2019-07-05 22:59     ` Bruno Haible
2019-07-04 19:39   ` [PATCH] stat: don't explicitly request file size for filenames Andreas Dilger
2019-07-04 23:19   ` Andreas Dilger
2019-07-05  1:30     ` Bruno Haible
2019-07-05  2:36     ` Bruno Haible
2019-07-05 22:56       ` Bruno Haible
2019-07-05 23:53   ` Paul Eggert
2019-07-06 19:10     ` [PATCH] areadlink-with-size: guess a buffer size with 0 size Pádraig Brady
2019-07-07  2:28       ` Paul Eggert
2019-07-19  3:24         ` Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).