From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id E3EBE20374 for ; Tue, 16 Apr 2019 23:40:39 +0000 (UTC) Received: (qmail 51979 invoked by alias); 16 Apr 2019 23:40:37 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 51971 invoked by uid 89); 16 Apr 2019 23:40:37 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: brightrain.aerifal.cx Date: Tue, 16 Apr 2019 19:40:31 -0400 From: Rich Felker To: Wilco Dijkstra Cc: Szabolcs Nagy , Zack Weinberg , nd , GNU C Library Subject: Re: [PATCH v2] Improve performance of strstr Message-ID: <20190416234031.GH23599@brightrain.aerifal.cx> References: <20190412171613.GB23599@brightrain.aerifal.cx> <20190415144051.GE23599@brightrain.aerifal.cx> <734450c0-1959-6f47-3d8f-35a746f331c9@arm.com> <953ad13a-cd71-32c4-ccc3-d1b8cac57620@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: Rich Felker On Tue, Apr 16, 2019 at 10:52:28PM +0000, Wilco Dijkstra wrote: > Hi Szabolcs, > > > this is a good case for twoway, so we need a twoway worst case too > > for a real comparision: comparing the worst for new vs worst for > > twoway i've seen so far, new is > > Yes absolutely, it's easy to get 1000 times difference between fast and > slow cases, but that's not interesting at all. The goal is to make the average > case fast - even if it makes uncommon cases slower. > > > 4.5x slower on Cortex-A72 > > 2.7x slower on Cortex-A57 > > That shows how much it varies even on 2 similar micro architectures. > Note these results are using the optimized Two-Way implementation - the > original implementation was more than twice as slow before I optimized it. > So it's possible the worst cases are faster now than in GLIBC 2.26. > > > but there is no guarantee that my inputs are near the real worst > > cases, it's hard to tell (and clearly uarch matters too). > > We could generate lots of cases to try finding a minimum, but the search space is > impossibly large. And the worst cases depend a lot on microarchitecture so they > will be different on different CPUs. > > Ultimately the 3 key questions are: > > 1. Is it linear time even on huge inputs? > 2. Is it faster than Two-way in the average case? > 3. Is it within a small factor even in the worst case? > > The first 2 were never in doubt, and your result proves 3 as well. His result was up to 40x slower, which refutes #3. Rich