From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bug-gnulib-bounces+normalperson=yhbt.net@gnu.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-Status: No, score=-5.5 required=3.0 tests=AWL,BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,
	SPF_PASS shortcircuit=no autolearn=unavailable autolearn_force=no
	version=3.4.2
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id 9DCA11F4B4
	for <normalperson@yhbt.net>; Sat,  2 Jan 2021 00:04:19 +0000 (UTC)
Received: from localhost ([::1]:46288 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <bug-gnulib-bounces+normalperson=yhbt.net@gnu.org>)
	id 1kvUP3-0007NX-L6
	for normalperson@yhbt.net; Fri, 01 Jan 2021 19:04:17 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:49958)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eggert@cs.ucla.edu>)
 id 1kvUOw-0007NH-L7
 for bug-gnulib@gnu.org; Fri, 01 Jan 2021 19:04:10 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:47128)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eggert@cs.ucla.edu>)
 id 1kvUOt-0002Q3-U0
 for bug-gnulib@gnu.org; Fri, 01 Jan 2021 19:04:09 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id AE5FB16006F;
 Fri,  1 Jan 2021 16:04:04 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id 8taflPQ-tbUs; Fri,  1 Jan 2021 16:04:03 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 65639160114;
 Fri,  1 Jan 2021 16:04:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id xcANB2Dbd8ns; Fri,  1 Jan 2021 16:04:03 -0800 (PST)
Received: from [192.168.1.9] (cpe-23-243-218-95.socal.res.rr.com
 [23.243.218.95])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 370E616006F;
 Fri,  1 Jan 2021 16:04:03 -0800 (PST)
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
References: <20201229193454.34558-1-adhemerval.zanella@linaro.org>
 <20201229193454.34558-5-adhemerval.zanella@linaro.org>
 <c65fa1a0-62ff-2649-d3e8-b365d010fb82@cs.ucla.edu>
 <e1a8ac31-6f77-a7fd-e892-78b6ee0faf63@linaro.org>
 <502b6d2d-1139-ca9d-14cf-00082adc915e@linaro.org>
From: Paul Eggert <eggert@cs.ucla.edu>
Organization: UCLA Computer Science Department
Subject: Re: [PATCH v3 4/6] stdlib: Sync canonicalize with gnulib [BZ #10635]
 [BZ #26592] [BZ #26341] [BZ #24970]
Message-ID: <275283e0-70ee-5ea4-e63d-d0f1d1393667@cs.ucla.edu>
Date: Fri, 1 Jan 2021 16:04:02 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.5.0
MIME-Version: 1.0
In-Reply-To: <502b6d2d-1139-ca9d-14cf-00082adc915e@linaro.org>
Content-Type: multipart/mixed; boundary="------------9322ED6FCFCBEBC8E4E11ECB"
Content-Language: en-US
Received-SPF: pass client-ip=131.179.128.68; envelope-from=eggert@cs.ucla.edu;
 helo=zimbra.cs.ucla.edu
X-Spam_score_int: -68
X-Spam_score: -6.9
X-Spam_bar: ------
X-Spam_report: (-6.9 / 5.0 requ) BAYES_00=-1.9, NICE_REPLY_A=-2.749,
 RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: bug-gnulib@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Gnulib discussion list <bug-gnulib.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnulib>,
 <mailto:bug-gnulib-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/bug-gnulib>
List-Post: <mailto:bug-gnulib@gnu.org>
List-Help: <mailto:bug-gnulib-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnulib>,
 <mailto:bug-gnulib-request@gnu.org?subject=subscribe>
Cc: libc-alpha@sourceware.org, bug-gnulib@gnu.org
Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org
Sender: "bug-gnulib" <bug-gnulib-bounces+normalperson=yhbt.net@gnu.org>

This is a multi-part message in MIME format.
--------------9322ED6FCFCBEBC8E4E11ECB
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable

On 12/30/20 5:10 AM, Adhemerval Zanella wrote:

>> it is just really
>> a small optimization that adds code complexity on a somewhat convolute=
d
>> code.

The code is indeed simpler without the NARROW_ADDRESSES optimization, so=20
I removed that optimization by installing the attached patch into Gnulib.

>> For ENAMETOOLONG, I think this is the right error code: it enforces
>> that we do not support internal objects longer that PTRDIFF_MAX.

This sounds backwards, as the code returns ENOMEM every other place it=20
tries to create an internal object longer than PTRDIFF_MAX - these=20
ENOMEM checks are in the malloc calls invoked by scratch_buffer_grow and=20
scratch_buffer_grow_preserve. It would be odd for canonicalize_file_name=20
to return ENAMETOOLONG for this one particular way of creating a=20
too-large object, while at the same time it returns ENOMEM for all the=20
other ways.

Besides, ENAMETOOLONG is the POSIX error code for exceeding NAME_MAX or=20
PATH_MAX, which is not what is happening here.

In Gnulib and other GNU apps we've long used the tradition that ENOMEM=20
means you've run out of memory, regardless of whether it's because your=20
heap or your address space is too small. This is a good tradition and=20
it'd be good to use it here too.

>> I think it should be a fair assumption to make it on internal code, su=
ch
>> as realpath

Yes, staying less than PTRDIFF_MAX is a vital assumption on internal=20
objects. I'd go even further and say it's important for user-supplied=20
objects, too, as so much code relies on pointer subtraction and we can't=20
realistically prohibit that within glibc.

> (this is another reason why I think NARROW_ADDRESSES is not=20
> necessary).

Unfortunately, if we merely assume every object has at most PTRDIFF_MAX=20
bytes, we still must check for overflow when adding the sizes of two=20
objects. The NARROW_ADDRESSES optimization would have let us avoid that=20
unnecessary check on 64-bit machines.

> And your fix (from 93e0186d4) does not really solve the issue, since
> now that len is a size_t the overflow check won't catch the potentially
> allocation larger than PTRDIFF_MAX (the realpath will still fail with
> ENOMEM though).

Sure, which means the code is doing the right thing: it's failing with=20
ENOMEM because it ran out of memory. There is no need for an extra=20
PTRDIFF_MAX check in canonicalize.c if malloc (via scratch_buffer_grow)=20
already does the check.
> Wouldn't the below be simpler?
>=20
>                size_t len =3D strlen (end);
>                if (len > IDX_MAX || INT_ADD_OVERFLOW ((idx_t) len, n))
>                  {
>                    __set_errno (ENAMETOOLONG);
>                    goto error_nomem;
>                  }

It's not simpler than the attached Gnulib patch, because it contains an=20
unnecessary comparison to IDX_MAX and an unnecessary cast to idx_t.

--------------9322ED6FCFCBEBC8E4E11ECB
Content-Type: text/x-patch; charset=UTF-8;
 name="0001-canonicalize-remove-NARROW_ADDRESSES-optimization.patch"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
 filename*0="0001-canonicalize-remove-NARROW_ADDRESSES-optimization.patch"

=46rom 8f6b9b66be6672bed1045c27e606dd9fcedcf022 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 1 Jan 2021 15:54:43 -0800
Subject: [PATCH] canonicalize: remove NARROW_ADDRESSES optimization

* lib/canonicalize-lgpl.c, lib/canonicalize.c (NARROW_ADDRESSES):
Remove, and remove all uses, as the optimization is arguably not
worth the extra complexity.  Suggested by Adhemerval Zanella in:
https://sourceware.org/pipermail/libc-alpha/2020-December/121203.html
---
 ChangeLog               | 8 ++++++++
 lib/canonicalize-lgpl.c | 6 +-----
 lib/canonicalize.c      | 6 +-----
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 2d498a5e9..fc45e1176 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2021-01-01  Paul Eggert  <eggert@cs.ucla.edu>
+
+	canonicalize: remove NARROW_ADDRESSES optimization
+	* lib/canonicalize-lgpl.c, lib/canonicalize.c (NARROW_ADDRESSES):
+	Remove, and remove all uses, as the optimization is arguably not
+	worth the extra complexity.  Suggested by Adhemerval Zanella in:
+	https://sourceware.org/pipermail/libc-alpha/2020-December/121203.html
+
 2021-01-01  Bruno Haible  <bruno@clisp.org>
=20
 	stddef: Try harder to get max_align_t defined on OpenBSD.
diff --git a/lib/canonicalize-lgpl.c b/lib/canonicalize-lgpl.c
index 560e24288..698f9ede2 100644
--- a/lib/canonicalize-lgpl.c
+++ b/lib/canonicalize-lgpl.c
@@ -85,10 +85,6 @@
 # define IF_LINT(Code) /* empty */
 #endif
=20
-/* True if adding two valid object sizes might overflow idx_t.
-   As a practical matter, this cannot happen on 64-bit machines.  */
-enum { NARROW_ADDRESSES =3D IDX_MAX >> 31 >> 31 =3D=3D 0 };
-
 #ifndef DOUBLE_SLASH_IS_DISTINCT_ROOT
 # define DOUBLE_SLASH_IS_DISTINCT_ROOT false
 #endif
@@ -343,7 +339,7 @@ realpath_stk (const char *name, char *resolved,
               if (end_in_extra_buffer)
                 end_idx =3D end - extra_buf;
               size_t len =3D strlen (end);
-              if (NARROW_ADDRESSES && INT_ADD_OVERFLOW (len, n))
+              if (INT_ADD_OVERFLOW (len, n))
                 {
                   __set_errno (ENOMEM);
                   goto error_nomem;
diff --git a/lib/canonicalize.c b/lib/canonicalize.c
index cc32260a8..3a1c8098b 100644
--- a/lib/canonicalize.c
+++ b/lib/canonicalize.c
@@ -42,10 +42,6 @@
 # define IF_LINT(Code) /* empty */
 #endif
=20
-/* True if adding two valid object sizes might overflow idx_t.
-   As a practical matter, this cannot happen on 64-bit machines.  */
-enum { NARROW_ADDRESSES =3D IDX_MAX >> 31 >> 31 =3D=3D 0 };
-
 #ifndef DOUBLE_SLASH_IS_DISTINCT_ROOT
 # define DOUBLE_SLASH_IS_DISTINCT_ROOT false
 #endif
@@ -393,7 +389,7 @@ canonicalize_filename_mode_stk (const char *name, can=
onicalize_mode_t can_mode,
               if (end_in_extra_buffer)
                 end_idx =3D end - extra_buf;
               size_t len =3D strlen (end);
-              if (NARROW_ADDRESSES && INT_ADD_OVERFLOW (len, n))
+              if (INT_ADD_OVERFLOW (len, n))
                 xalloc_die ();
               while (extra_buffer.length <=3D len + n)
                 {
--=20
2.27.0


--------------9322ED6FCFCBEBC8E4E11ECB--