From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.2 required=3.0 tests=AWL,BAYES_00,BODY_8BITS, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id CDCDD1F462 for ; Mon, 10 Jun 2019 11:46:45 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; q=dns; s=default; b=WT3ztdL5wfQn1kh+o32l9i0Bom8xn xIACCsECzVnK4CPtPyn9HXiyWo4bNOpxd8w0hRxfQPfvhUHVYBsioOSTdxokMLXw mwaqZvpA4oH02o3tOkm+iGpx2ZEfYbGy6gwXmBVBbum3Wzpo9Cqqn2NnY+RcTrW7 gS5ZW+aYf90snw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; s=default; bh=BkfeKEWd4QC8Da/sXYP4OORw0Bg=; b=R3l Zew4lbmi8bsx2syP19O/KPokmpe4DczEdB990gp90sOx1cTE2saWeIrsiZZxpZ7x Lx/ysh7duTqXa40+gwlOmR/jgg2MvaCJJUq30NQdDwLEcwEkVpfWpUh5qqk1AWf9 9Ve6GRWj6YBr1WECjYEdi0jvM4oKO5OuvBDn+yFE= Received: (qmail 74950 invoked by alias); 10 Jun 2019 11:46:43 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 74937 invoked by uid 89); 10 Jun 2019 11:46:42 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: EUR04-DB3-obe.outbound.protection.outlook.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=avNRM9IF6NZ9M1WZfTMJCxhQtGSCDlaHs/kLG1otmzo=; b=lKB1QVupGxkGpfRyT4fBhkfb0nlW/1wOqxzBX7FEFfhSbGVJAbDkorPLod+rd3uFqUifDesIHQMXVOQzKmnkgSJcrQRmCOOt0tN4Zd+yZqqXKTATmnCcCoQlAQUPzMRwC03CiBSUO03p/jvNUfE2R668EVSiR2t87EmoXAdXpqQ= From: Wilco Dijkstra To: Adhemerval Zanella , "libc-alpha@sourceware.org" CC: nd Subject: Re: [PATCH v2] Benchmark strstr hard needles Date: Mon, 10 Jun 2019 11:46:37 +0000 Message-ID: References: ,<3cacddd7-308b-7579-e6f8-50f6c49778cc@linaro.org>, In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-oob-tlc-oobclassifiers: OLM:2512; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Wilco.Dijkstra@arm.com Hi Adhemerval, > +char * volatile null =3D NULL; > I am not seeing why exactly it requires a global non static volatile > variable to call do_one_test. You're right, this is not necessary in the current version (IIRC it was to = avoid a warning), so I have removed it. > Maybe add some context from where these tests came from, from the > libc-alpha discussion? Sure, I've improved the comments and description to make it clearer what the hard needle test is doing and why. New version below. Wilco [PATCH v3] Benchmark strstr hard needles v3: Improve comments/description. Benchmark needles which exhibit worst-case performance.=A0 This shows that basic_strstr is quadratic and thus unsuitable for large needles. On the other hand the Two-way and new strstr implementations are linear wit= h increasing needle sizes.=A0 The slowest cases of the two implementations ar= e within a factor of 2 on several different microarchitectures.=A0 Two-way is slowest on inputs which cause a branch mispredict on almost every character= . The new strstr is slowest on inputs which almost match and result in many calls to memcmp.=A0 Thanks to Szabolcs for providing various hard needles. ChangeLog: 2019-05-21=A0 Wilco Dijkstra=A0 =A0=A0=A0=A0=A0=A0=A0 * benchtests/bench-strstr.c (test_hard_needle): New f= unction. -- diff --git a/benchtests/bench-strstr.c b/benchtests/bench-strstr.c index 31309b24029a96de7381e1050fd89e5d26642e5f..5ec6edcf1869dd198a3326cb85f= 2a4463ae3417c 100644 --- a/benchtests/bench-strstr.c +++ b/benchtests/bench-strstr.c @@ -203,6 +203,81 @@ do_test (size_t align1, size_t align2, size_t len1, si= ze_t len2, =A0=A0 putchar ('\n'); =A0} =A0 +/* Test needles which exhibit worst-case performance.=A0 This shows that +=A0=A0 basic_strstr is quadratic and thus unsuitable for large needles. +=A0=A0 On the other hand Two-way and skip table implementations are linear= with +=A0=A0 increasing needle sizes.=A0 The slowest cases of the two implementa= tions are +=A0=A0 within a factor of 2 on several different microarchitectures.=A0 */ + +static void +test_hard_needle (size_t ne_len, size_t hs_len) +{ +=A0 char *ne =3D (char *) buf1; +=A0 char *hs =3D (char *) buf2; + +=A0 /* Hard needle for strstr algorithm using skip table.=A0 This results = in many +=A0=A0=A0=A0 memcmp calls comparing most of the needle.=A0 */ +=A0 { +=A0=A0=A0 memset (ne, 'a', ne_len); +=A0=A0=A0 ne[ne_len] =3D '\0'; +=A0=A0=A0 ne[ne_len - 14] =3D 'b'; + +=A0=A0=A0 memset (hs, 'a', hs_len); +=A0=A0=A0 for (size_t i =3D ne_len; i <=3D hs_len; i +=3D ne_len) +=A0=A0=A0=A0=A0 { +=A0=A0=A0=A0=A0=A0 hs[i-5] =3D 'b'; +=A0=A0=A0=A0=A0=A0 hs[i-62] =3D 'b'; +=A0=A0=A0=A0=A0 } + +=A0=A0=A0 printf ("Length %4zd/%3zd, complex needle 1:", hs_len, ne_len); + +=A0=A0=A0 FOR_EACH_IMPL (impl, 0) +=A0=A0=A0=A0=A0 do_one_test (impl, hs, ne, NULL); +=A0=A0=A0 putchar ('\n'); +=A0 } + +=A0 /* 2nd hard needle for strstr algorithm using skip table.=A0 This resu= lts in +=A0=A0=A0=A0 many memcmp calls comparing most of the needle.=A0 */ +=A0 { +=A0=A0=A0 memset (ne, 'a', ne_len); +=A0=A0=A0 ne[ne_len] =3D '\0'; +=A0=A0=A0 ne[ne_len - 6] =3D 'b'; + +=A0=A0=A0 memset (hs, 'a', hs_len); +=A0=A0=A0 for (size_t i =3D ne_len; i <=3D hs_len; i +=3D ne_len) +=A0=A0=A0=A0=A0 { +=A0=A0=A0=A0=A0=A0 hs[i-5] =3D 'b'; +=A0=A0=A0=A0=A0=A0 hs[i-6] =3D 'b'; +=A0=A0=A0=A0=A0 } + +=A0=A0=A0 printf ("Length %4zd/%3zd, complex needle 2:", hs_len, ne_len); + +=A0=A0=A0 FOR_EACH_IMPL (impl, 0) +=A0=A0=A0=A0=A0 do_one_test (impl, hs, ne, NULL); +=A0=A0=A0 putchar ('\n'); +=A0 } + +=A0 /* Hard needle for Two-way algorithm - the random input causes a large= number +=A0=A0=A0=A0 of branch mispredictions which significantly reduces performa= nce on modern +=A0=A0=A0=A0 micro architectures.=A0 */ +=A0 { +=A0=A0=A0 for (int i =3D 0; i < hs_len; i++) +=A0=A0=A0=A0=A0 hs[i] =3D (rand () & 255) > 155 ? 'a' : 'b'; +=A0=A0=A0 hs[hs_len] =3D 0; + +=A0=A0=A0 memset (ne, 'a', ne_len); +=A0=A0=A0 ne[ne_len-2] =3D 'b'; +=A0=A0=A0 ne[0] =3D 'b'; +=A0=A0=A0 ne[ne_len] =3D 0; + +=A0=A0=A0 printf ("Length %4zd/%3zd, complex needle 3:", hs_len, ne_len); + +=A0=A0=A0 FOR_EACH_IMPL (impl, 0) +=A0=A0=A0=A0=A0 do_one_test (impl, hs, ne, NULL); +=A0=A0=A0 putchar ('\n'); +=A0 } +} + =A0static int =A0test_main (void) =A0{ @@ -227,6 +302,10 @@ test_main (void) =A0=A0=A0=A0=A0=A0=A0=A0 do_test (14, 5, hlen, klen, 1); =A0=A0=A0=A0=A0=A0 } =A0 +=A0 test_hard_needle (64, 65536); +=A0 test_hard_needle (256, 65536); +=A0 test_hard_needle (1024, 65536); + =A0=A0 return ret; =A0} =A0 =