From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id B6C961F770 for ; Mon, 31 Dec 2018 21:02:10 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id :content-type:content-transfer-encoding:mime-version; q=dns; s= default; b=eG5g0ntFlI+ObbSj1wuSAf56TywzA9n+mgY8PzgPNbMDPJP8EFtgS 97JrHZC/faoPSwAZL6MJSQ0Ujvw1yrBLrxkK8GJww7gP3iETbDVCUvTOvZMuSDpD Im9OlEeSatT9sXPNhsHSotZrMr/NwkFPJrm8U04mOzkQgl7CLDCN/U= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id :content-type:content-transfer-encoding:mime-version; s=default; bh=vGkyIf0SFGsBrVrzREWypafp2Kg=; b=HJGRH027mZpAgJC6OAQbhgigeoso c9KXVtDciRcO9ltiz1DwvNcUV8qgzs3V86QQaPfY0X6dGB8Uc/JR/41Ky9PImTSz pk7lPmBNy1k4MwqXBuwHbFXswrll4cboYGVea9K5PqrLENzawChQK/A5i6fm4FQV iZX1CzAy2Ecb710= Received: (qmail 77864 invoked by alias); 31 Dec 2018 21:02:08 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 77854 invoked by uid 89); 31 Dec 2018 21:02:07 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: EUR04-HE1-obe.outbound.protection.outlook.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tGlmkJKxpPsMmUHgLsVzSR+nET9STCzCKmQWci9Kx10=; b=SZA7nlJdjubqRcRsUruC93qWD/zEMb0OTqWlg1ByFY3wIzZeI9b/OC1LTMFNdjZfe+WIE7A2t52Pnx+FT3GsOuM6a6/rZu/6IYOGOxT3KhLgSGXSPfQUnSTCnUFJ0fqaG3nk9aizRoqAgR1nvfHWZyDgnCxy1w5/xdsTiaZvslk= From: Wilco Dijkstra To: 'GNU C Library' CC: nd , Rich Felker Subject: [PATCH v2] Improve performance of strstr Date: Mon, 31 Dec 2018 21:01:56 +0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 This patch significantly improves performance of strstr compared to the previous version [1] using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 3 use a dedicated linear search. Very long needle= s use the Two-Way algorithm. The performance gain using the improved bench-strstr [2] on Cortex-A72 is 5= .8 times basic_strstr and 3.7 times twoway_strstr. Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c). OK for commit? [1] https://www.sourceware.org/ml/libc-alpha/2018-10/msg00634.html [2] https://www.sourceware.org/ml/libc-alpha/2018-12/msg01057.html ChangeLog: 2018-12-31 Wilco Dijkstra * string/str-two-way.h (two_way_short_needle): Add inline to avoid warning. (two_way_long_needle): Block inlining. * string/strstr.c (strstr2): Add new function. (strstr3): Likewise. (STRSTR): Completely rewrite strstr to improve performance. -- diff --git a/string/str-two-way.h b/string/str-two-way.h index 523d946c59412e1f1f65b8ec3778553b83191952..5a800e0eaf1c7505a9340a7aabd= 149326958df4a 100644 --- a/string/str-two-way.h +++ b/string/str-two-way.h @@ -221,7 +221,7 @@ critical_factorization (const unsigned char *needle, si= ze_t needle_len, most 2 * HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching. If AVAILABLE modifies HAYSTACK_LEN (as in strstr), then at most 3 * HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching. */ -static RETURN_TYPE +static inline RETURN_TYPE two_way_short_needle (const unsigned char *haystack, size_t haystack_len, const unsigned char *needle, size_t needle_len) { @@ -383,7 +383,7 @@ two_way_short_needle (const unsigned char *haystack, si= ze_t haystack_len, If AVAILABLE modifies HAYSTACK_LEN (as in strstr), then at most 3 * HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching, and sublinear performance is not possible. */ -static RETURN_TYPE +__attribute__((noinline)) static RETURN_TYPE two_way_long_needle (const unsigned char *haystack, size_t haystack_len, const unsigned char *needle, size_t needle_len) { diff --git a/string/strstr.c b/string/strstr.c index f74d7189ed1319f6225525cc2d32380745de1523..aca83626772f5a23da78643edfe= ec3bf2feafc2d 100644 --- a/string/strstr.c +++ b/string/strstr.c @@ -16,21 +16,12 @@ License along with the GNU C Library; if not, see . */ =20 -/* This particular implementation was written by Eric Blake, 2008. */ - #ifndef _LIBC # include #endif =20 -/* Specification of strstr. */ #include =20 -#include - -#ifndef _LIBC -# define __builtin_expect(expr, val) (expr) -#endif - #define RETURN_TYPE char * #define AVAILABLE(h, h_l, j, n_l) \ (((j) + (n_l) <=3D (h_l)) \ @@ -47,47 +38,122 @@ #define STRSTR strstr #endif =20 -/* Return the first occurrence of NEEDLE in HAYSTACK. Return HAYSTACK - if NEEDLE is empty, otherwise NULL if NEEDLE is not found in - HAYSTACK. */ -char * -STRSTR (const char *haystack, const char *needle) +static inline char * +strstr2 (const unsigned char *hs, const unsigned char *ne) { - size_t needle_len; /* Length of NEEDLE. */ - size_t haystack_len; /* Known minimum length of HAYSTACK. */ - - /* Handle empty NEEDLE special case. */ - if (needle[0] =3D=3D '\0') - return (char *) haystack; + uint32_t h1 =3D (ne[0] << 16) | ne[1]; + uint32_t h2 =3D 0; + for (int c =3D hs[0]; h1 !=3D h2 && c !=3D 0; c =3D *++hs) + h2 =3D (h2 << 16) | c; + return h1 =3D=3D h2 ? (char *)hs - 2 : NULL; +} =20 - /* Skip until we find the first matching char from NEEDLE. */ - haystack =3D strchr (haystack, needle[0]); - if (haystack =3D=3D NULL || needle[1] =3D=3D '\0') - return (char *) haystack; +static inline char * +strstr3 (const unsigned char *hs, const unsigned char *ne) +{ + uint32_t h1 =3D (ne[0] << 24) | (ne[1] << 16) | (ne[2] << 8); + uint32_t h2 =3D 0; + for (int c =3D hs[0]; h1 !=3D h2 && c !=3D 0; c =3D *++hs) + h2 =3D (h2 | c) << 8; + return h1 =3D=3D h2 ? (char *)hs - 3 : NULL; +} =20 - /* Ensure HAYSTACK length is at least as long as NEEDLE length. - Since a match may occur early on in a huge HAYSTACK, use strnlen +#define hash2(p) (((size_t)(p)[0] - ((size_t)(p)[-1] << 3)) % sizeof (shif= t)) + +/* Fast strstr algorithm with guaranteed linear-time performance. + Small needles up to size 2 use a dedicated linear search. Longer needl= es + up to size 256 use a novel modified Horspool algorithm. It hashes pair= s + of characters to quickly skip past mismatches. The main search loop on= ly + exits if the last 2 characters match, avoiding unnecessary calls to mem= cmp + and allowing for a larger skip if there is no match. A self-adapting + filtering check is used to quickly detect mismatches in long needles. + By limiting the needle length to 256, the shift table can be reduced to= 8 + bits per entry, lowering preprocessing overhead and minimizing cache ef= fects. + The limit also implies worst-case performance is linear. + Needles larger than 256 characters use the linear-time Two-Way algorith= m. */ +char * +STRSTR (const char *haystack, const char *needle) +{ + const unsigned char *hs =3D (const unsigned char *) haystack; + const unsigned char *ne =3D (const unsigned char *) needle; + + /* Handle short needle special cases first. */ + if (ne[0] =3D=3D '\0') + return (char *)hs; + hs =3D (const unsigned char *)strchr ((const char*)hs, ne[0]); + if (hs =3D=3D NULL || ne[1] =3D=3D '\0') + return (char*)hs; + if (ne[2] =3D=3D '\0') + return strstr2 (hs, ne); + if (ne[3] =3D=3D '\0') + return strstr3 (hs, ne); + + /* Ensure haystack length is at least as long as needle length. + Since a match may occur early on in a huge haystack, use strnlen and read ahead a few cachelines for improved performance. */ - needle_len =3D strlen (needle); - haystack_len =3D __strnlen (haystack, needle_len + 256); - if (haystack_len < needle_len) + size_t ne_len =3D strlen ((const char*)ne); + size_t hs_len =3D __strnlen ((const char*)hs, ne_len | 512); + if (hs_len < ne_len) return NULL; =20 - /* Check whether we have a match. This improves performance since we av= oid - the initialization overhead of the two-way algorithm. */ - if (memcmp (haystack, needle, needle_len) =3D=3D 0) - return (char *) haystack; - - /* Perform the search. Abstract memory is considered to be an array - of 'unsigned char' values, not an array of 'char' values. See - ISO C 99 section 6.2.6.1. */ - if (needle_len < LONG_NEEDLE_THRESHOLD) - return two_way_short_needle ((const unsigned char *) haystack, - haystack_len, - (const unsigned char *) needle, needle_len); - return two_way_long_needle ((const unsigned char *) haystack, haystack_l= en, - (const unsigned char *) needle, needle_len); + /* Check whether we have a match. This improves performance since we + avoid initialization overheads. */ + if (memcmp (hs, ne, ne_len) =3D=3D 0) + return (char *) hs; + + /* Use Two-Way algorithm for very long needles. */ + if (__glibc_unlikely (ne_len > 256)) + return two_way_long_needle (hs, hs_len, ne, ne_len); + + const unsigned char *end =3D hs + hs_len - ne_len; + uint8_t shift[256]; + size_t tmp, shift1; + size_t m1 =3D ne_len - 1; + size_t offset =3D 0; + + /* Initialize bad character shift hash table. */ + memset (shift, 0, sizeof (shift)); + for (int i =3D 1; i < m1; i++) + shift[hash2 (ne + i)] =3D i; + shift1 =3D m1 - shift[hash2 (ne + m1)]; + shift[hash2 (ne + m1)] =3D m1; + + while (1) + { + if (__glibc_unlikely (hs > end)) + { + end +=3D __strnlen ((const char*)end + m1 + 1, 2048); + if (hs > end) + return NULL; + } + + /* Skip past character pairs not in the needle. */ + do + { + hs +=3D m1; + tmp =3D shift[hash2 (hs)]; + } + while (hs <=3D end && tmp =3D=3D 0); + + /* If the match is not at the end of the needle, shift to the end + and continue until we match the last 2 characters. */ + hs -=3D tmp; + if (tmp < m1) + continue; + + /* The last 2 characters match. If the needle is long, check a + fixed number of characters first to quickly filter out mismatches. */ + if (memcmp (hs + offset, ne + offset, sizeof (int)) =3D=3D 0) + { + if (memcmp (hs, ne, m1) =3D=3D 0) + return (void *) hs; + + /* Adjust filter offset when it doesn't find the mismatch. */ + offset =3D (offset >=3D sizeof (int) ? offset : m1) - sizeof (int); + } + + /* Skip based on matching the last 2 characters. */ + hs +=3D shift1; + } } libc_hidden_builtin_def (strstr) - -#undef LONG_NEEDLE_THRESHOLD