From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 94CA91F770 for ; Mon, 31 Dec 2018 21:24:34 +0000 (UTC) Received: (qmail 71430 invoked by alias); 31 Dec 2018 21:24:32 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 71419 invoked by uid 89); 31 Dec 2018 21:24:31 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: brightrain.aerifal.cx Date: Mon, 31 Dec 2018 16:24:23 -0500 From: Rich Felker To: Wilco Dijkstra Cc: 'GNU C Library' , nd Subject: Re: [PATCH v2] Improve performance of strstr Message-ID: <20181231212423.GR23599@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: Rich Felker On Mon, Dec 31, 2018 at 09:01:56PM +0000, Wilco Dijkstra wrote: > This patch significantly improves performance of strstr compared to the > previous version [1] using a novel modified Horspool algorithm. Needles up > to size 256 use a bad-character table indexed by hashed pairs of characters > to quickly skip past mismatches. Long needles use a self-adapting filtering > step to avoid comparing the whole needle repeatedly. > > By limiting the needle length to 256, the shift table only requires 8 bits > per entry, lowering preprocessing overhead and minimizing cache effects. > This limit also implies worst-case performance is linear. > > Small needles up to size 3 use a dedicated linear search. Very long needles > use the Two-Way algorithm. > > The performance gain using the improved bench-strstr [2] on Cortex-A72 is 5..8 > times basic_strstr and 3.7 times twoway_strstr. > > Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test > (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c). > > OK for commit? This is a major performance regression and should not be committed. The test case does not cover anything near the worst cases. With a limit more like 16 bytes it might be reasonable but 256 is plenty large to see the quadratic effects (64k is a big number). Rich