unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Zack Weinberg" <zack@owlfolio.org>
To: "Paul Eggert" <eggert@cs.ucla.edu>,
	"Siddhesh Poyarekar" <siddhesh@gotplt.org>,
	"Vincent Lefevre" <vincent@vinc17.net>,
	"Xi Ruoyao" <xry111@xry111.site>,
	"Adhemerval Zanella" <adhemerval.zanella@linaro.org>,
	"Turritopsis Dohrnii Teo En Ming" <teo.en.ming@protonmail.com>,
	"GNU libc development" <libc-alpha@sourceware.org>,
	"ceo@teo-en-ming-corp.com" <ceo@teo-en-ming-corp.com>
Subject: Re: New GNU C Library (glibc) security flaw reported on 30 Jan 2024
Date: Wed, 07 Feb 2024 12:07:24 -0500	[thread overview]
Message-ID: <4d94a528-fe3f-413d-afa0-91a41f8371ff@app.fastmail.com> (raw)
In-Reply-To: <ef8e1640-1074-4952-8997-dda56396d565@cs.ucla.edu>

[-- Attachment #1: Type: text/plain, Size: 1808 bytes --]

On Tue, Feb 6, 2024, at 4:30 PM, Paul Eggert wrote:
> On 2/6/24 07:00, Zack Weinberg wrote:
>> This sentence only makes sense to me because of what you said in the
>> cover letter, about the array not being required to be totally
>> ordered.
> the paraphrase you sent was too generous, as it allowed the 
> array to be in completely random order if it had no matching element
> ... we're likely better off sticking with POSIXish wording.

I see what was wrong with the way I wrote it, but I really don't like
the "POSIXish wording".  To me it trips over the English-language ambiguity
of whether the "all" in "all X, Y, and Z" is meant to apply to Y and Z as
well as X. I revised my suggestion again (see attached patch).

>> and C2011 7.22.5p2 actually makes this a hard requirement: “The
>> implementation shall ensure that both arguments [of the comparison
>> function called by qsort] are pointers to elements of the array.”
>> It looks to me like there are situations where our implementation
>> doesn’t do this:
>
> I don't see that in the glibc source. Are you sure about that?

I was wrong about this; I misread the code in qsort.c.  Specifically,
I missed that msort_with_tmp always merges *from* two subsets of the
main array, *into* the temporary array, and then copies the result back
to the main array.  Still, I think it is likely that some C libraries in
the past have failed to implement this requirement (intentionally or not).
In the attached patch I revised that suggestion again and I hope it satisfies
everyone's expectations now.

I might send another patch for this file later; I think it would be clearer to
consolidate most of the new language about how the comparison function is required
to behave into the "Comparison Functions" node.

zw

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-bsearch-qsort-etc.-doc-to-match-POSIX-better.patch --]
[-- Type: text/x-patch; name="0001-Fix-bsearch-qsort-etc.-doc-to-match-POSIX-better.patch", Size: 7390 bytes --]

From 30259d4588c0b587624afb55daa86fe469fe2d8e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 4 Feb 2024 16:53:22 -0800
Subject: [PATCH] Fix bsearch, qsort etc. doc to match POSIX better
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* manual/search.texi (Array Search Function):
Correct the statement about lfind’s mean runtime:
it is proportional to a number (not that number),
and this is true only if random elements are searched for.
Relax the constraint on bsearch’s array argument:
POSIX says it need not be sorted, only partially sorted.
Say that the first arg passed to bsearch’s comparison function
is the key, and the second arg is an array element, as
POSIX requires.  For bsearch and qsort, say that the
comparison function should not alter the array, as POSIX
requires.  For qsort, say that the comparison function
must define a total order, as POSIX requires.  Add a warning
that the comparison function should not rely on object addresses.
Clarify how to ensure a stable sort, usage of temporary storage,
and what algorithmic properties qsort may have.
---
 manual/search.texi | 63 ++++++++++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 22 deletions(-)

diff --git a/manual/search.texi b/manual/search.texi
index db577a5332..4f373d21ca 100644
--- a/manual/search.texi
+++ b/manual/search.texi
@@ -84,8 +84,9 @@ The return value is a pointer to the matching element in the array
 starting at @var{base} if it is found.  If no matching element is
 available @code{NULL} is returned.
 
-The mean runtime of this function is @code{*@var{nmemb}}/2.  This
-function should only be used if elements often get added to or deleted from
+The mean runtime of this function is proportional to @code{*@var{nmemb}/2},
+assuming random elements of the array are searched for.  This
+function should be used only if elements often get added to or deleted from
 the array in which case it might not be useful to sort the array before
 searching.
 @end deftypefun
@@ -122,24 +123,31 @@ bytes.  If one is sure the element is in the array it is better to use
 calling @code{lsearch}.
 @end deftypefun
 
-To search a sorted array for an element matching the key, use the
-@code{bsearch} function.  The prototype for this function is in
+To search a sorted or partially sorted array for an element matching the key,
+use the @code{bsearch} function.  The prototype for this function is in
 the header file @file{stdlib.h}.
 @pindex stdlib.h
 
 @deftypefun {void *} bsearch (const void *@var{key}, const void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
 @standards{ISO, stdlib.h}
 @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
-The @code{bsearch} function searches the sorted array @var{array} for an object
+The @code{bsearch} function searches @var{array} for an object
 that is equivalent to @var{key}.  The array contains @var{count} elements,
 each of which is of size @var{size} bytes.
 
 The @var{compare} function is used to perform the comparison.  This
-function is called with two pointer arguments and should return an
+function is called with arguments that point to the key and to an
+array element, in that order, and should return an
 integer less than, equal to, or greater than zero corresponding to
-whether its first argument is considered less than, equal to, or greater
-than its second argument.  The elements of the @var{array} must already
-be sorted in ascending order according to this comparison function.
+whether the key is considered less than, equal to, or greater than
+the array element.  The function should not alter the array's contents.
+
+@var{array} does not need to be completely sorted according to
+@var{compare}, but it does need to be partially sorted with respect to
+@var{key}.  That is, the array must begin with all the elements that
+compare less than @var{key}; next must come all the elements that
+compare equal to @var{key}; and last, all the elements that compare
+greater than @var{key}.  (Any or all of these sub-sequences may be empty.)
 
 The return value is a pointer to the matching array element, or a null
 pointer if no match is found.  If the array contains more than one element
@@ -170,21 +178,33 @@ The @var{compare} function is used to perform the comparison on the
 array elements.  This function is called with two pointer arguments and
 should return an integer less than, equal to, or greater than zero
 corresponding to whether its first argument is considered less than,
-equal to, or greater than its second argument.
+equal to, or greater than its second argument.  The function must
+not alter the array's contents, and must define a total ordering
+of all the array elements, including any unusual values such as
+floating-point NaN (@pxref{Infinity and NaN}) that are present.
+
+@code{qsort} may attempt to allocate large amounts of temporary
+storage, using @code{malloc}.  However, memory allocation failure will
+not prevent it from sorting the array.
 
 @cindex stable sorting
 @strong{Warning:} If two objects compare as equal, their order after
 sorting is unpredictable.  That is to say, the sorting is not stable.
 This can make a difference when the comparison considers only part of
 the elements.  Two elements with the same sort key may differ in other
-respects.
+respects.  The only way to ensure a stable sort, when @var{compare}
+does not consider all of the data in the objects being sorted, is to
+augment each object with a tie-breaking value, such as its original
+array index.
 
-Although the object addresses passed to the comparison function lie
-within the array, they need not correspond with the original locations
-of those objects because the sorting algorithm may swap around objects
-in the array before making some comparisons.  The only way to perform
-a stable sort with @code{qsort} is to first augment the objects with a
-monotonic counter of some kind.
+@strong{Warning:} The result of @var{compare} should not depend in
+any way on the @emph{addresses} of the objects it is comparing.
+ISO C requires that both addresses passed to the comparison function
+always lie within the original array, and @theglibc{} honors this
+requirement, but other C libraries might not.  More importantly, the
+sorting process may temporarily move objects out of order, so the
+relative positions of objects within the array are meaningless while
+@code{qsort} is running.
 
 Here is a simple example of sorting an array of @code{long int} in numerical
 order, using the comparison function defined above (@pxref{Comparison
@@ -200,11 +220,10 @@ Functions}):
 @end smallexample
 
 The @code{qsort} function derives its name from the fact that it was
-originally implemented using the ``quick sort'' algorithm.
-
-The implementation of @code{qsort} attempts to allocate auxiliary storage
-and use the merge sort algorithm, without violating C standard requirement
-that arguments passed to the comparison function point within the array.
+originally implemented using the ``quick sort'' algorithm.  Modern C
+libraries (including @theglibc{}) may or may not use this algorithm.
+However, you can rely on @code{qsort} to be asymptotically optimal in
+the average case (i.e.@: @math{O(n \log n)}).
 @end deftypefun
 
 @node Search/Sort Example
-- 
2.43.0


  parent reply	other threads:[~2024-02-07 17:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 14:08 New GNU C Library (glibc) security flaw reported on 30 Jan 2024 Turritopsis Dohrnii Teo En Ming
2024-01-31 14:23 ` Xi Ruoyao
2024-01-31 14:55   ` Vincent Lefevre
2024-01-31 15:52     ` Adhemerval Zanella Netto
2024-01-31 16:23       ` Vincent Lefevre
2024-01-31 16:44         ` Siddhesh Poyarekar
2024-01-31 18:47       ` Xi Ruoyao
2024-02-01  0:51         ` Vincent Lefevre
2024-02-01  1:03           ` Vincent Lefevre
2024-02-01  6:41           ` Xi Ruoyao
2024-02-01  9:07             ` Vincent Lefevre
2024-02-01 19:55               ` Paul Eggert
2024-02-01 21:11                 ` Siddhesh Poyarekar
2024-02-05  0:58                   ` Paul Eggert
2024-02-06 15:00                     ` Zack Weinberg
2024-02-06 21:30                       ` Paul Eggert
2024-02-06 22:04                         ` Xi Ruoyao
2024-02-07 17:07                         ` Zack Weinberg [this message]
2024-02-07 19:55                           ` Alexander Monakov
2024-02-07 20:45                             ` Zack Weinberg
2024-02-07 21:53                               ` Alexander Monakov
2024-02-07 22:56                               ` Paul Eggert
2024-04-06 17:17                           ` Paul Eggert
2024-04-08  8:28                             ` Florian Weimer
2024-04-22 14:39                               ` Zack Weinberg
2024-04-23 18:09                                 ` Paul Eggert
2024-04-23 18:26                                   ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4d94a528-fe3f-413d-afa0-91a41f8371ff@app.fastmail.com \
    --to=zack@owlfolio.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=ceo@teo-en-ming-corp.com \
    --cc=eggert@cs.ucla.edu \
    --cc=libc-alpha@sourceware.org \
    --cc=siddhesh@gotplt.org \
    --cc=teo.en.ming@protonmail.com \
    --cc=vincent@vinc17.net \
    --cc=xry111@xry111.site \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).