From: "René Scharfe" <l.s.r@web.de>
To: Git List <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>
Subject: [PATCH 5/9] test-mergesort: add unriffle mode
Date: Fri, 1 Oct 2021 11:16:49 +0200 [thread overview]
Message-ID: <fbeacc0e-d870-2aef-3177-9b661ffc6768@web.de> (raw)
In-Reply-To: <943b1e01-465e-5def-a766-0adf667690de@web.de>
Add a mode that turns sorted items into adversarial input for mergesort.
Do that by running mergesort in reverse and rearranging the items in
such a way that each merge needs the maximum number of operations to
undo it.
To riffle is a card shuffling technique and involves splitting a deck
into two and then to interleave them. A perfect riffle takes one card
from each half in turn. That's similar to the most expensive merge,
which has to take one item from each sublist in turn, which requires the
maximum number of comparisons (n-1).
So unriffle does that in reverse, i.e. it generates the first sublist
out of the items at even indexes and the second sublist out of the items
at odd indexes, without changing their order in any other way. Done
recursively until we reach the trivial sublist length of one, this
twists the list into an order that requires the maximum effort for
mergesort to untangle.
As a baseline, here are the rand distributions with the highest number
of comparisons from "test-tool mergesort test":
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 1184 700 569 OK
rand reverse_1st_half 1023 256 16373 10230 8976 OK
rand reverse_1st_half 1024 512 16384 10240 8993 OK
rand dither 1025 64 18454 11275 9970 OK
And here are the most expensive ones overall:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
stagger reverse 100 64 1184 700 580 OK
sawtooth unriffle 1023 1024 16373 10230 9179 OK
sawtooth unriffle 1024 1024 16384 10240 9217 OK
stagger unriffle 1025 2048 18454 11275 10241 OK
The sawtooth distribution with m>=n generates a sorted list. The
unriffle mode is designed to turn that into adversarial input for
mergesort, and that checks out for n=1023 and n=1024, where it produces
the list that requires the most comparisons.
Item counts that are not powers of two have other winners, and that's
because unriffle recursively splits lists into equal-sized halves, while
llist_mergesort() splits them into the biggest power of two smaller than
n and the rest, e.g. for n=1025 it sorts the first 1024 separately and
finally merges them to the last item.
So unriffle mode works as designed for the intended use case, but to
consistently generate adversarial input for unbalanced merges we need
something else.
Signed-off-by: René Scharfe <l.s.r@web.de>
---
t/helper/test-mergesort.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/t/helper/test-mergesort.c b/t/helper/test-mergesort.c
index 27ba252d4a..d71ef568f3 100644
--- a/t/helper/test-mergesort.c
+++ b/t/helper/test-mergesort.c
@@ -150,6 +150,34 @@ static void mode_dither(int *arr, int n)
arr[i] += i % 5;
}
+static void unriffle(int *arr, int n, int *tmp)
+{
+ int i, j;
+ COPY_ARRAY(tmp, arr, n);
+ for (i = j = 0; i < n; i += 2)
+ arr[j++] = tmp[i];
+ for (i = 1; i < n; i += 2)
+ arr[j++] = tmp[i];
+}
+
+static void unriffle_recursively(int *arr, int n, int *tmp)
+{
+ if (n > 1) {
+ int half = n / 2;
+ unriffle(arr, n, tmp);
+ unriffle_recursively(arr, half, tmp);
+ unriffle_recursively(arr + half, n - half, tmp);
+ }
+}
+
+static void mode_unriffle(int *arr, int n)
+{
+ int *tmp;
+ ALLOC_ARRAY(tmp, n);
+ unriffle_recursively(arr, n, tmp);
+ free(tmp);
+}
+
#define MODE(name) { #name, mode_##name }
static struct mode {
@@ -162,6 +190,7 @@ static struct mode {
MODE(reverse_2nd_half),
MODE(sort),
MODE(dither),
+ MODE(unriffle),
};
static const struct mode *get_mode_by_name(const char *name)
--
2.33.0
next prev parent reply other threads:[~2021-10-01 9:16 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-01 9:07 [PATCH 0/9] mergesort: improve tests and performance René Scharfe
2021-10-01 9:10 ` [PATCH 1/9] test-mergesort: use strbuf_getline() René Scharfe
2021-10-02 9:08 ` Ævar Arnfjörð Bjarmason
2021-10-02 16:56 ` René Scharfe
2021-10-01 9:11 ` [PATCH 2/9] test-mergesort: add sort subcommand René Scharfe
2021-10-01 20:26 ` Junio C Hamano
2021-10-01 9:12 ` [PATCH 3/9] test-mergesort: add test subcommand René Scharfe
2021-10-01 20:26 ` Junio C Hamano
2021-10-02 8:35 ` Ævar Arnfjörð Bjarmason
2021-10-03 10:15 ` René Scharfe
2021-10-03 17:33 ` Junio C Hamano
2021-10-07 20:00 ` René Scharfe
2021-10-08 4:04 ` [PATCH 10/9 v2] test-mergesort: use repeatable random numbers René Scharfe
2021-10-08 4:17 ` Jeff King
2021-10-08 7:23 ` Ævar Arnfjörð Bjarmason
2021-10-08 17:30 ` René Scharfe
2021-10-08 19:00 ` Ævar Arnfjörð Bjarmason
2021-10-03 10:15 ` [PATCH 3/9] test-mergesort: add test subcommand René Scharfe
2021-10-01 9:14 ` [PATCH 4/9] test-mergesort: add generate subcommand René Scharfe
2021-10-01 9:16 ` René Scharfe [this message]
2021-10-01 9:17 ` [PATCH 6/9] test-mergesort: add unriffle_skewed mode René Scharfe
2021-10-01 9:19 ` [PATCH 7/9] p0071: measure sorting of already sorted and reversed files René Scharfe
2021-10-01 9:19 ` [PATCH 8/9] p0071: test performance of llist_mergesort() René Scharfe
2021-10-01 9:22 ` [PATCH 9/9] mergesort: use ranks stack René Scharfe
2022-01-17 17:43 ` Ævar Arnfjörð Bjarmason
2022-01-17 18:22 ` René Scharfe
2022-01-18 5:07 ` René Scharfe
2022-01-18 10:40 ` Ævar Arnfjörð Bjarmason
2022-01-18 12:27 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fbeacc0e-d870-2aef-3177-9b661ffc6768@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).