From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-101449-e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS31976 209.132.180.0/23
X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,
	SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham
	autolearn_force=no version=3.4.2
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id E3EBE20374
	for <e@80x24.org>; Tue, 16 Apr 2019 23:40:39 +0000 (UTC)
Received: (qmail 51979 invoked by alias); 16 Apr 2019 23:40:37 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-e=80x24.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 51971 invoked by uid 89); 16 Apr 2019 23:40:37 -0000
Authentication-Results: sourceware.org; auth=none
X-HELO: brightrain.aerifal.cx
Date: Tue, 16 Apr 2019 19:40:31 -0400
From: Rich Felker <dalias@libc.org>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>, Zack Weinberg <zackw@panix.com>,
	nd <nd@arm.com>, GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH v2] Improve performance of strstr
Message-ID: <20190416234031.GH23599@brightrain.aerifal.cx>
References: <AM6PR08MB507895A65E1F0EE5780D39C983280@AM6PR08MB5078.eurprd08.prod.outlook.com>
 <20190412171613.GB23599@brightrain.aerifal.cx>
 <AM6PR08MB50783C3A091B45F16965261E832B0@AM6PR08MB5078.eurprd08.prod.outlook.com>
 <20190415144051.GE23599@brightrain.aerifal.cx>
 <AM6PR08MB5078A8FE8466F406C3BCD11B832B0@AM6PR08MB5078.eurprd08.prod.outlook.com>
 <CAKCAbMhGx-pmWm53WeqahJQerkU8iHJ50cfuzQ6Y_Ktiq4bwMw@mail.gmail.com>
 <734450c0-1959-6f47-3d8f-35a746f331c9@arm.com>
 <953ad13a-cd71-32c4-ccc3-d1b8cac57620@arm.com>
 <e9fe9dd8-8204-311b-9001-9e50d840ae1b@arm.com>
 <AM6PR08MB5078D25F3AD9DE31CB3806AC83240@AM6PR08MB5078.eurprd08.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <AM6PR08MB5078D25F3AD9DE31CB3806AC83240@AM6PR08MB5078.eurprd08.prod.outlook.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: Rich Felker <dalias@aerifal.cx>

On Tue, Apr 16, 2019 at 10:52:28PM +0000, Wilco Dijkstra wrote:
> Hi Szabolcs,
> 
> > this is a good case for twoway, so we need a twoway worst case too
> > for a real comparision: comparing the worst for new vs worst for
> > twoway i've seen so far, new is
> 
> Yes absolutely, it's easy to get 1000 times difference between fast and
> slow cases, but that's not interesting at all. The goal is to make the average
> case fast - even if it makes uncommon cases slower.
> 
> > 4.5x slower on Cortex-A72
> > 2.7x slower on Cortex-A57
> 
> That shows how much it varies even on 2 similar micro architectures.
> Note these results are using the optimized Two-Way implementation - the
> original implementation was more than twice as slow before I optimized it.
> So it's possible the worst cases are faster now than in GLIBC 2.26.
> 
> > but there is no guarantee that my inputs are near the real worst
> > cases, it's hard to tell (and clearly uarch matters too).
> 
> We could generate lots of cases to try finding a minimum, but the search space is
> impossibly large. And the worst cases depend a lot on microarchitecture so they
> will be different on different CPUs.
> 
> Ultimately the 3 key questions are:
> 
> 1. Is it linear time even on huge inputs?
> 2. Is it faster than Two-way in the average case?
> 3. Is it within a small factor even in the worst case?
> 
> The first 2 were never in doubt, and your result proves 3 as well.

His result was up to 40x slower, which refutes #3.

Rich