From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id AEF7520248 for ; Tue, 16 Apr 2019 22:52:36 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; q=dns; s=default; b=IYBmFeXT9XtYaJcaiGIQSbR5klKfj LBpt1VBzeYU4FSCGvNjVhxbd8/WOrgCpGfpFZChow/KfqJgdPXwLCqZzbK1/paKQ bYffh9iaUaM/fcGnvKtsjl4WVzj+NRRRghwquG6gNoXa4D4oJ7VwavbRdBumusi4 6HbDHmcSQfYXjQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; s=default; bh=wLQUbyv1KxfA1mRv4AO9riYa/sM=; b=q2f WdLro3djQ/a3CmGYqrTur5vz9k6vnW/G1lZpBhUxUB2gCSg7a25NrFPMMBGIGhE2 n/3ZBnPNAsjnWbRtSrkB2O88xfew/Q+DFm2Dpi3L6qcPGAijFxifp2cwM61TcZvV XwiU1gsuVx1P0hdd098OoNEEhh/qi8YZkKln5GkE= Received: (qmail 33280 invoked by alias); 16 Apr 2019 22:52:34 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 33272 invoked by uid 89); 16 Apr 2019 22:52:33 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: EUR04-DB3-obe.outbound.protection.outlook.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FuNOSrNJLg1o4JUyAVu01wKxf6n5c24VzruY9G91nhA=; b=K3i38T5j9Wf2sZYCYRg+DOoVipS04AO0mKKuJFZBnW2P6zDArFQ8LtwYrhQOO8ytiL0Gp2UQCCAN/CUBqRGA4hGdyCRmRurJkH00sYdmmhplXX+AB+26RR8yP7+6IczCWcbWVDF6GtGQcYqf2MkZoS4c1yOu7IHiR2GH86gXwBY= From: Wilco Dijkstra To: Szabolcs Nagy , Zack Weinberg CC: nd , Rich Felker , GNU C Library Subject: Re: [PATCH v2] Improve performance of strstr Date: Tue, 16 Apr 2019 22:52:28 +0000 Message-ID: References: <2b27b4ca-0370-442d-9f39-210265f00444@arm.com> <20190412171613.GB23599@brightrain.aerifal.cx> <20190415144051.GE23599@brightrain.aerifal.cx> <734450c0-1959-6f47-3d8f-35a746f331c9@arm.com> <953ad13a-cd71-32c4-ccc3-d1b8cac57620@arm.com>, In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED Hi Szabolcs, > this is a good case for twoway, so we need a twoway worst case too > for a real comparision: comparing the worst for new vs worst for > twoway i've seen so far, new is Yes absolutely, it's easy to get 1000 times difference between fast and slow cases, but that's not interesting at all. The goal is to make the aver= age case fast - even if it makes uncommon cases slower. > 4.5x slower on Cortex-A72 > 2.7x slower on Cortex-A57 That shows how much it varies even on 2 similar micro architectures. Note these results are using the optimized Two-Way implementation - the original implementation was more than twice as slow before I optimized it. So it's possible the worst cases are faster now than in GLIBC 2.26. > but there is no guarantee that my inputs are near the real worst > cases, it's hard to tell (and clearly uarch matters too). We could generate lots of cases to try finding a minimum, but the search sp= ace is impossibly large. And the worst cases depend a lot on microarchitecture so = they will be different on different CPUs. Ultimately the 3 key questions are: 1. Is it linear time even on huge inputs? 2. Is it faster than Two-way in the average case? 3. Is it within a small factor even in the worst case? The first 2 were never in doubt, and your result proves 3 as well. Wilco