From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00,BODY_8BITS, DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 7FB131F453 for ; Fri, 1 Feb 2019 15:39:22 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; q=dns; s=default; b=v2cNsUMqbioA80nVFElCj/GDFl26q L+PkZR03YFuBbb7yJpDo8yS38ahn1guNboPMTSGCrhNOJYkZMqN/pxu5u5uv8NMg BHxNJN7yLktQLlXpEZqko58AetdHcMVF26FvU0NhQ7/muYvQg9/Fjj5GFHqI9nEq smWxGWRZTGwtAA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; s=default; bh=qWTHp/UwHTdVfTSykXdeANuquyE=; b=Ncx 80ycsc/1K8NZhpE5BNJVhhn4dcON8UPCuUz8JQd2dJlQBZCSY0p1zSIPv28MewDg KDgWf1Wwvt+pLPSvFzMikidGzMv3U0DleE0DVzA/xH6iCbYNMJlUJ2l8rrecEnog S/SZm+JqsYcVpwaqy8MtLpm2KhH2Z6D42thsLo0c= Received: (qmail 94145 invoked by alias); 1 Feb 2019 15:39:20 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 94137 invoked by uid 89); 1 Feb 2019 15:39:20 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: EUR02-VE1-obe.outbound.protection.outlook.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dG2f4zSmzlsoWTga2FtTXYDCM7NnvbzZjS+rWTQX4is=; b=LtNSAPj9tDk8HKR4/BIzl1dVQpdiGcepe7VsuBDai2NrK7Q3fW8709/vU+NEUtCjz1hlZyFofeRWlxPwOqpmFCVZh0FCBntCVZe0HUmjJed5d6guyeFx8s0uabMaOIoSYf7QkIEq8YTPqGqp5ojUB+/NvWCQD9GrNVJRXRrxwz8= From: Wilco Dijkstra To: 'GNU C Library' CC: nd Subject: Re: [PATCH v2] Improve bench-strstr Date: Fri, 1 Feb 2019 15:39:13 +0000 Message-ID: References: In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED ping From: Wilco Dijkstra Sent: 31 December 2018 16:46 To: 'GNU C Library' Cc: nd Subject: [PATCH v2] Improve bench-strstr =A0=20 Improve bench-strstr by using an extract from the manual as the input to make the test more realistic.=A0 Use the same input for both found and fail cases rather than using a memset of '0' for most of the string, which measures performance of strchr rather than strstr.=A0 Add result checking to catch potential errors.=A0 Remove the repeated tests at slightl= y different alignments and add more large needle and haystack testcases. Replace stupid_strstr with an efficient basic implementation.=A0 Add the Two-way implementation to simplify comparisons with much faster generic implementations. ChangeLog: 2018-12-31=A0 Wilco Dijkstra=A0 =A0=A0=A0=A0=A0=A0=A0 * benchtests/bench-strstr.c (input): Added realistic = input text. =A0=A0=A0=A0=A0=A0=A0 (stupid_strstr): Remove function. =A0=A0=A0=A0=A0=A0=A0 (basic_strstr): Add function. =A0=A0=A0=A0=A0=A0=A0 (twoway_strstr): Add function. =A0=A0=A0=A0=A0=A0=A0 (do_one_test): Add result checking. =A0=A0=A0=A0=A0=A0=A0 (do_test): Use new input text.=A0 Remove accidental e= arly matches. =A0=A0=A0=A0=A0=A0=A0 (test_main): Improve range of tests, reduce unaligned= cases. -- diff --git a/benchtests/bench-strstr.c b/benchtests/bench-strstr.c index a31294e3c96d80a4fd61bb5b423a825fe54d3227..8805e9205a05e3decf4b9fc7dcb= b05bcfe78f39a 100644 --- a/benchtests/bench-strstr.c +++ b/benchtests/bench-strstr.c @@ -16,63 +16,143 @@ =A0=A0=A0 License along with the GNU C Library; if not, see =A0=A0=A0 .=A0 */ =A0 +#define MIN_PAGE_SIZE 131072 =A0#define TEST_MAIN =A0#define TEST_NAME "strstr" =A0#include "bench-string.h" =A0 =A0 -#define STRSTR simple_strstr +#define STRSTR generic_strstr =A0#define libc_hidden_builtin_def(X) =A0#define __strnlen strnlen =A0#include "../string/strstr.c" =A0 +static const char input[] =3D +"This manual is written with the assumption that you are at least " +"somewhat familiar with the C programming language and basic programming " +"concepts.=A0 Specifically, familiarity with ISO standard C (*note ISO " +"C::), rather than =93traditional=94 pre-ISO C dialects, is assumed.\n" + +"=A0=A0 The GNU C Library includes several =93header files=94, each of whi= ch " +"provides definitions and declarations for a group of related facilities; = " +"this information is used by the C compiler when processing your program. = " +"For example, the header file =91stdio.h=92 declares facilities for " +"performing input and output, and the header file =91string.h=92 declares = " +"string processing utilities.=A0 The organization of this manual generally= " +"follows the same division as the header files.\n" + +"=A0=A0 If you are reading this manual for the first time, you should read= " +"all of the introductory material and skim the remaining chapters.=A0 Ther= e " +"are a _lot_ of functions in the GNU C Library and it=92s not realistic to= " +"expect that you will be able to remember exactly _how_ to use each and " +"every one of them.=A0 It=92s more important to become generally familiar = " +"with the kinds of facilities that the library provides, so that when you = " +"are writing your programs you can recognize _when_ to make use of " +"library functions, and _where_ in this manual you can find more specific = " +"information about them.\n"; + =A0 =A0static char * -stupid_strstr (const char *s1, const char *s2) +basic_strstr (const char *s1, const char *s2) =A0{ -=A0 ssize_t s1len =3D strlen (s1); -=A0 ssize_t s2len =3D strlen (s2); +=A0 size_t i; +=A0 int c =3D s2[0]; =A0 -=A0 if (s2len > s1len) -=A0=A0=A0 return NULL; +=A0 if (c =3D=3D 0) +=A0=A0=A0 return (char*)s1; =A0 -=A0 for (ssize_t i =3D 0; i <=3D s1len - s2len; ++i) +=A0 for ( ; s1[0] !=3D '\0'; s1++) =A0=A0=A0=A0 { -=A0=A0=A0=A0=A0 size_t j; -=A0=A0=A0=A0=A0 for (j =3D 0; j < s2len; ++j) -=A0=A0=A0=A0=A0=A0 if (s1[i + j] !=3D s2[j]) +=A0=A0=A0=A0=A0 if (s1[0] !=3D c) +=A0=A0=A0=A0=A0=A0 continue; +=A0=A0=A0=A0=A0 for (i =3D 1; s2[i] !=3D 0; i++) +=A0=A0=A0=A0=A0=A0 if (s1[i] !=3D s2[i]) =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 break; -=A0=A0=A0=A0=A0 if (j =3D=3D s2len) -=A0=A0=A0=A0=A0=A0 return (char *) s1 + i; +=A0=A0=A0=A0=A0 if (s2[i] =3D=3D '\0') +=A0=A0=A0=A0=A0=A0 return (char*)s1; =A0=A0=A0=A0 } =A0 =A0=A0 return NULL; =A0} =A0 +#define RETURN_TYPE char * +#define AVAILABLE(h, h_l, j, n_l)=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 \ +=A0 (((j) + (n_l) <=3D (h_l)) \ +=A0=A0 || ((h_l) +=3D __strnlen ((void*)((h) + (h_l)), (n_l) + 512), \ +=A0=A0=A0=A0=A0=A0 (j) + (n_l) <=3D (h_l))) +#define CHECK_EOL (1) +#define RET0_IF_0(a) if (!a) goto ret0 +#define FASTSEARCH(S,C,N) (void*) strchr ((void*)(S), (C)) +#define LONG_NEEDLE_THRESHOLD 32U + +static char * +twoway_strstr (const char *haystack, const char *needle) +{ +=A0 size_t needle_len; /* Length of NEEDLE.=A0 */ +=A0 size_t haystack_len; /* Known minimum length of HAYSTACK.=A0 */ + +=A0 /* Handle empty NEEDLE special case.=A0 */ +=A0 if (needle[0] =3D=3D '\0') +=A0=A0=A0 return (char *) haystack; + +=A0 /* Skip until we find the first matching char from NEEDLE.=A0 */ +=A0 haystack =3D strchr (haystack, needle[0]); +=A0 if (haystack =3D=3D NULL || needle[1] =3D=3D '\0') +=A0=A0=A0 return (char *) haystack; + +=A0 /* Ensure HAYSTACK length is at least as long as NEEDLE length. +=A0=A0=A0=A0 Since a match may occur early on in a huge HAYSTACK, use strn= len +=A0=A0=A0=A0 and read ahead a few cachelines for improved performance.=A0 = */ +=A0 needle_len =3D strlen (needle); +=A0 haystack_len =3D __strnlen (haystack, needle_len + 256); +=A0 if (haystack_len < needle_len) +=A0=A0=A0 return NULL; + +=A0 /* Check whether we have a match.=A0 This improves performance since w= e avoid +=A0=A0=A0=A0 the initialization overhead of the two-way algorithm.=A0 */ +=A0 if (memcmp (haystack, needle, needle_len) =3D=3D 0) +=A0=A0=A0 return (char *) haystack; + +=A0 /* Perform the search.=A0 Abstract memory is considered to be an array +=A0=A0=A0=A0 of 'unsigned char' values, not an array of 'char' values.=A0 = See +=A0=A0=A0=A0 ISO C 99 section 6.2.6.1.=A0 */ +=A0 if (needle_len < LONG_NEEDLE_THRESHOLD) +=A0=A0=A0 return two_way_short_needle ((const unsigned char *) haystack, +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 haystack_len, +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 (const unsigned char *) needle, needle_len); +=A0 return two_way_long_needle ((const unsigned char *) haystack, haystack= _len, +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 (const unsigned char *) needle, needle_len); +} =A0 =A0typedef char *(*proto_t) (const char *, const char *); =A0 -IMPL (stupid_strstr, 0) -IMPL (simple_strstr, 0) =A0IMPL (strstr, 1) - +IMPL (generic_strstr, 0) +IMPL (twoway_strstr, 0) +IMPL (basic_strstr, 0) =A0 =A0static void =A0do_one_test (impl_t *impl, const char *s1, const char *s2, char *exp_res= ult) =A0{ =A0=A0 size_t i, iters =3D INNER_LOOP_ITERS; =A0=A0 timing_t start, stop, cur; +=A0 char *res; =A0 =A0=A0 TIMING_NOW (start); =A0=A0 for (i =3D 0; i < iters; ++i) -=A0=A0=A0 { -=A0=A0=A0=A0=A0 CALL (impl, s1, s2); -=A0=A0=A0 } +=A0=A0=A0 res =3D CALL (impl, s1, s2); =A0=A0 TIMING_NOW (stop); =A0 =A0=A0 TIMING_DIFF (cur, start, stop); =A0 =A0=A0 TIMING_PRINT_MEAN ((double) cur, (double) iters); + +=A0 if (res !=3D exp_result) +=A0=A0=A0 { +=A0=A0=A0=A0=A0 error (0, 0, "Wrong result in function %s %s %s", impl->na= me, +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 res, exp_result); +=A0=A0=A0=A0=A0 ret =3D 1; +=A0=A0=A0 } =A0} =A0 =A0 @@ -83,36 +163,42 @@ do_test (size_t align1, size_t align2, size_t len1, si= ze_t len2, =A0=A0 char *s1 =3D (char *) (buf1 + align1); =A0=A0 char *s2 =3D (char *) (buf2 + align2); =A0 -=A0 static const char d[] =3D "1234567890abcdef"; -#define dl (sizeof (d) - 1) -=A0 char *ss2 =3D s2; -=A0 for (size_t l =3D len2; l > 0; l =3D l > dl ? l - dl : 0) -=A0=A0=A0 { -=A0=A0=A0=A0=A0 size_t t =3D l > dl ? dl : l; -=A0=A0=A0=A0=A0 ss2 =3D mempcpy (ss2, d, t); -=A0=A0=A0 } -=A0 s2[len2] =3D '\0'; +=A0 size_t size =3D sizeof (input) - 1; +=A0 size_t pos =3D (len1 + len2) % size; =A0 -=A0 if (fail) +=A0 char *ss2 =3D s2; +=A0 for (size_t l =3D len2; l > 0; l =3D l > size ? l - size : 0) =A0=A0=A0=A0 { -=A0=A0=A0=A0=A0 char *ss1 =3D s1; -=A0=A0=A0=A0=A0 for (size_t l =3D len1; l > 0; l =3D l > dl ? l - dl : 0) +=A0=A0=A0=A0=A0 size_t t =3D l > size ? size : l; +=A0=A0=A0=A0=A0 if (pos + t <=3D size) +=A0=A0=A0=A0=A0=A0 ss2 =3D mempcpy (ss2, input + pos, t); +=A0=A0=A0=A0=A0 else =A0=A0=A0=A0=A0=A0=A0=A0 { -=A0=A0=A0=A0=A0=A0=A0=A0 size_t t =3D l > dl ? dl : l; -=A0=A0=A0=A0=A0=A0=A0=A0 memcpy (ss1, d, t); -=A0=A0=A0=A0=A0=A0=A0=A0 ++ss1[len2 > 7 ? 7 : len2 - 1]; -=A0=A0=A0=A0=A0=A0=A0=A0 ss1 +=3D t; +=A0=A0=A0=A0=A0=A0=A0=A0 ss2 =3D mempcpy (ss2, input + pos, size - pos); +=A0=A0=A0=A0=A0=A0=A0=A0 ss2 =3D mempcpy (ss2, input, t - (size - pos)); =A0=A0=A0=A0=A0=A0=A0=A0 } =A0=A0=A0=A0 } -=A0 else +=A0 s2[len2] =3D '\0'; + +=A0 char *ss1 =3D s1; +=A0 for (size_t l =3D len1; l > 0; l =3D l > size ? l - size : 0) =A0=A0=A0=A0 { -=A0=A0=A0=A0=A0 memset (s1, '0', len1); -=A0=A0=A0=A0=A0 memcpy (s1 + len1 - len2, s2, len2); +=A0=A0=A0=A0=A0 size_t t =3D l > size ? size : l; +=A0=A0=A0=A0=A0 memcpy (ss1, input, t); +=A0=A0=A0=A0=A0 ss1 +=3D t; =A0=A0=A0=A0 } + +=A0 if (!fail) +=A0=A0=A0 memcpy (s1 + len1 - len2, s2, len2); =A0=A0 s1[len1] =3D '\0'; =A0 -=A0 printf ("Length %4zd/%zd, alignment %2zd/%2zd, %s:", -=A0=A0=A0=A0=A0=A0=A0=A0 len1, len2, align1, align2, fail ? "fail" : "foun= d"); +=A0 /* Remove any accidental matches except for the last if !fail.=A0 */ +=A0 for (ss1 =3D basic_strstr (s1, s2); ss1; ss1 =3D basic_strstr (ss1 + 1= , s2)) +=A0=A0=A0 if (fail || ss1 !=3D s1 + len1 - len2) +=A0=A0=A0=A0=A0 ++ss1[len2 / 2]; + +=A0 printf ("Length %4zd/%3zd, alignment %2zd/%2zd, %s:", +=A0=A0=A0=A0=A0=A0=A0=A0 len1, len2, align1, align2, fail ? "fail " : "fou= nd"); =A0 =A0=A0 FOR_EACH_IMPL (impl, 0) =A0=A0=A0=A0 do_one_test (impl, s1, s2, fail ? NULL : s1 + len1 - len2); @@ -130,48 +216,19 @@ test_main (void) =A0=A0=A0=A0 printf ("\t%s", impl->name); =A0=A0 putchar ('\n'); =A0 -=A0 for (size_t klen =3D 2; klen < 32; ++klen) -=A0=A0=A0 for (size_t hlen =3D 2 * klen; hlen < 16 * klen; hlen +=3D klen) +=A0 for (size_t hlen =3D 64; hlen <=3D 256; hlen +=3D 32) +=A0=A0=A0 for (size_t klen =3D 1; klen <=3D 16; klen++) =A0=A0=A0=A0=A0=A0 { -=A0=A0=A0=A0=A0=A0 do_test (0, 0, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (0, 0, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (0, 3, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (0, 3, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (0, 9, hlen, klen, 0); +=A0=A0=A0=A0=A0=A0 do_test (1, 3, hlen, klen, 0); =A0=A0=A0=A0=A0=A0=A0=A0 do_test (0, 9, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (0, 15, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (0, 15, hlen, klen, 1); - -=A0=A0=A0=A0=A0=A0 do_test (3, 0, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (3, 0, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (3, 3, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (3, 3, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (3, 9, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (3, 9, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (3, 15, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (3, 15, hlen, klen, 1); - -=A0=A0=A0=A0=A0=A0 do_test (9, 0, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (9, 0, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (9, 3, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (9, 3, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (9, 9, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (9, 9, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (9, 15, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (9, 15, hlen, klen, 1); - -=A0=A0=A0=A0=A0=A0 do_test (15, 0, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (15, 0, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (15, 3, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (15, 3, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (15, 9, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (15, 9, hlen, klen, 1); -=A0=A0=A0=A0=A0=A0 do_test (15, 15, hlen, klen, 0); -=A0=A0=A0=A0=A0=A0 do_test (15, 15, hlen, klen, 1); =A0=A0=A0=A0=A0=A0 } =A0 -=A0 do_test (0, 0, page_size - 1, 16, 0); -=A0 do_test (0, 0, page_size - 1, 16, 1); +=A0 for (size_t hlen =3D 256; hlen <=3D 65536; hlen *=3D 2) +=A0=A0=A0 for (size_t klen =3D 16; klen <=3D 256; klen *=3D 2) +=A0=A0=A0=A0=A0 { +=A0=A0=A0=A0=A0=A0 do_test (1, 11, hlen, klen, 0); +=A0=A0=A0=A0=A0=A0 do_test (14, 5, hlen, klen, 1); +=A0=A0=A0=A0=A0 } =A0 =A0=A0 return ret; =A0} =