From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 8FF0D1F8C6 for ; Mon, 6 Sep 2021 20:23:31 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6B0143857036 for ; Mon, 6 Sep 2021 20:23:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6B0143857036 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1630959810; bh=YOLMQfCKzLt83kEDc196Udqv9WG70GYbjUHknAJOvtk=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=NCd1GluEiTFDaqEYAFmY1nktchy4rrZ0LqFeXUikzkMOVxxZw8FoTu+DF9W5jriec lbXpcB/e1v48rbihO4xoq9yO9veOvEgmBqwHL8eQVCrOUnuUjqM47DKJ6o/w7IdK33 oxlQEBCqZHHk/H+JxCbsNVufeml263P7rr+C9eeY= Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by sourceware.org (Postfix) with ESMTPS id B49FD385842B for ; Mon, 6 Sep 2021 20:23:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B49FD385842B Received: by mail-pl1-x636.google.com with SMTP id n18so4476218plp.7 for ; Mon, 06 Sep 2021 13:23:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=YOLMQfCKzLt83kEDc196Udqv9WG70GYbjUHknAJOvtk=; b=I8/TAA9I0S1FA5UzSZbfm+/iQO4xighblnXebnLmIHV7tRtGExznT5i6WOH4Q3Su0u n4Ii5eMz+a5V3bBhg0in/H/axclf9wDlzESeUvsnUW1oJuUBpnFjcUgDJ5jcsurv8hV5 3wRsXtyvj6ZYkzdOc56C+7qwYCqPEQyK98I66Kp25sFZrBbIsXdDw1BAT5thmJVVIisG 2HAXKwQxJzcycPSU4UEtF3YS5Isux3GtTATfs6n808GeRjd7GMtFF8uv70E3aSF+Za6r hrWliGGYNVUL92gUCX5j/QJkdHy+D7nFRCAk/TYeeLgeY+K/mcWgvAenCTe6s1IL3HFW e7MQ== X-Gm-Message-State: AOAM533BhNpEnT7kAUeOKEKB36kE4nFJ5qV6fyDYS7faIeG+dBzQpkWt 6CLTWjdp40CsCNqgenqRg43vVHtrCxegSA== X-Google-Smtp-Source: ABdhPJzfuDb7BxIvuo9vIwu0YP7Z4dSwNWV6srU8zwrdAUmMfSTDoedrBUm7ozz6Zge+FfwNkKT3lw== X-Received: by 2002:a17:903:1251:b0:13a:1f5e:20da with SMTP id u17-20020a170903125100b0013a1f5e20damr6207825plh.75.1630959789542; Mon, 06 Sep 2021 13:23:09 -0700 (PDT) Received: from google.com ([2620:15c:2ce:200:5e58:ee8:7bca:ad2e]) by smtp.gmail.com with ESMTPSA id l23sm235681pji.45.2021.09.06.13.23.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Sep 2021 13:23:09 -0700 (PDT) Date: Mon, 6 Sep 2021 13:23:05 -0700 To: Adhemerval Zanella Subject: Re: [PATCH v3 6/7] stdlib: Implement introsort with qsort Message-ID: <20210906202305.j63jju2oecxz5tfs@google.com> References: <20210903171144.952737-1-adhemerval.zanella@linaro.org> <20210903171144.952737-7-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20210903171144.952737-7-adhemerval.zanella@linaro.org> X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Fangrui Song via Libc-alpha Reply-To: Fangrui Song Cc: libc-alpha@sourceware.org Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On 2021-09-03, Adhemerval Zanella via Libc-alpha wrote: >This patch adds a introsort implementation on qsort to avoid worse-case >performance of quicksort to O(nlog n). The heapsort fallback used is a >heapsort based on Linux implementation (commit 22a241ccb2c19962a). As a >side note the introsort implementation is similar the one used on >libstdc++ for std::sort. > >Checked on x86_64-linux-gnu. >--- > stdlib/qsort.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 87 insertions(+), 7 deletions(-) > >diff --git a/stdlib/qsort.c b/stdlib/qsort.c >index 5df640362d..8368576aae 100644 >--- a/stdlib/qsort.c >+++ b/stdlib/qsort.c >@@ -113,6 +113,7 @@ typedef struct > { > char *lo; > char *hi; >+ size_t depth; > } stack_node; > > /* The stack needs log (total_elements) entries (we could even subtract >@@ -122,23 +123,92 @@ typedef struct > enum { STACK_SIZE = CHAR_BIT * sizeof (size_t) }; > > static inline stack_node * >-push (stack_node *top, char *lo, char *hi) >+push (stack_node *top, char *lo, char *hi, size_t depth) > { > top->lo = lo; > top->hi = hi; >+ top->depth = depth; > return ++top; > } > > static inline stack_node * >-pop (stack_node *top, char **lo, char **hi) >+pop (stack_node *top, char **lo, char **hi, size_t *depth) > { > --top; > *lo = top->lo; > *hi = top->hi; >+ *depth = top->depth; > return top; > } > > >+/* A fast, small, non-recursive O(nlog n) heapsort, adapted from Linux >+ lib/sort.c. Used on introsort implementation as a fallback routine with >+ worst-case performance of O(nlog n) and worst-case space complexity of >+ O(1). */ >+ >+static inline size_t >+parent (size_t i, unsigned int lsbit, size_t size) >+{ >+ i -= size; >+ i -= size & -(i & lsbit); i -= size & -(i & lsbit); may need a comment that it is logically if (i & lsbit) i -= size; to make i/2 a multiple of size. >+ return i / 2; >+} >+ >+static void >+heapsort_r (void *base, void *end, size_t size, swap_func_t swap_func, >+ __compar_d_fn_t cmp, void *arg) >+{ >+ size_t num = ((uintptr_t) end - (uintptr_t) base) / size; >+ size_t n = num * size, a = (num/2) * size; >+ /* Used to find parent */ >+ const unsigned int lsbit = size & -size; >+ >+ /* num < 2 || size == 0. */ >+ if (a == 0) >+ return; >+ >+ for (;;) >+ { >+ size_t b, c, d; >+ >+ if (a != 0) >+ /* Building heap: sift down --a */ >+ a -= size; >+ else if (n -= size) >+ /* Sorting: Extract root to --n */ >+ do_swap (base, base + n, size, swap_func); >+ else >+ break; >+ >+ /* Sift element at "a" down into heap. This is the "bottom-up" variant, >+ which significantly reduces calls to cmp_func(): we find the sift-down >+ path all the way to the leaves (one compare per level), then backtrack >+ to find where to insert the target element. >+ >+ Because elements tend to sift down close to the leaves, this uses fewer >+ compares than doing two per level on the way down. (A bit more than >+ half as many on average, 3/4 worst-case.). */ >+ for (b = a; c = 2 * b + size, (d = c + size) < n;) >+ b = cmp (base + c, base + d, arg) >= 0 ? c : d; >+ if (d == n) >+ /* Special case last leaf with no sibling. */ >+ b = c; >+ >+ /* Now backtrack from "b" to the correct location for "a". */ >+ while (b != a && cmp (base + a, base + b, arg) >= 0) >+ b = parent (b, lsbit, size); >+ /* Where "a" belongs. */ >+ c = b; >+ while (b != a) >+ { >+ /* Shift it into place. */ >+ b = parent (b, lsbit, size); >+ do_swap (base + b, base + c, size, swap_func); >+ } here a tab should be used. >+ } >+} >+ > /* Order size using quicksort. This implementation incorporates > four optimizations discussed in Sedgewick: > >@@ -223,7 +293,7 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, > > const size_t max_thresh = MAX_THRESH * size; > >- if (total_elems == 0) >+ if (total_elems <= 1) > /* Avoid lossage with unsigned arithmetic below. */ > return; > >@@ -235,6 +305,9 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, > else > swap_func = SWAP_BYTES; > >+ /* Maximum depth before quicksort switches to heapsort. */ >+ size_t depth = 2 * (CHAR_BIT - 1 - __builtin_clzl (total_elems)); With this fixed, the logic LGTM. > if (total_elems > MAX_THRESH) > { > char *lo = base_ptr; >@@ -242,10 +315,17 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, > stack_node stack[STACK_SIZE]; > stack_node *top = stack; > >- top = push (top, NULL, NULL); >+ top = push (top, NULL, NULL, depth); > > while (stack < top) > { >+ if (depth == 0) >+ { >+ heapsort_r (lo, hi, size, swap_func, cmp, arg); >+ top = pop (top, &lo, &hi, &depth); >+ continue; >+ } >+ > char *left_ptr; > char *right_ptr; > >@@ -309,7 +389,7 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, > { > if ((size_t) (hi - left_ptr) <= max_thresh) > /* Ignore both small partitions. */ >- top = pop (top, &lo, &hi); >+ top = pop (top, &lo, &hi, &depth); > else > /* Ignore small left partition. */ > lo = left_ptr; >@@ -320,13 +400,13 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, > else if ((right_ptr - lo) > (hi - left_ptr)) > { > /* Push larger left partition indices. */ >- top = push (top, lo, right_ptr); >+ top = push (top, lo, right_ptr, depth - 1); > lo = left_ptr; > } > else > { > /* Push larger right partition indices. */ >- top = push (top, left_ptr, hi); >+ top = push (top, left_ptr, hi, depth - 1); > hi = right_ptr; > } > } >-- >2.30.2 >