From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-101396-e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS31976 209.132.180.0/23
X-Spam-Status: No, score=-4.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id AF33420248
	for <e@80x24.org>; Mon, 15 Apr 2019 18:02:34 +0000 (UTC)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:from:to:cc:subject:date:message-id:references
	:in-reply-to:content-type:content-transfer-encoding
	:mime-version; q=dns; s=default; b=qfwGaHsv2MlrM4hDXNUSkLKPdOgF3
	5HacuJrSzsJgVQPChPyYnGZAJZ7TUKlAR5YXDQgOqwnWPE+5xHnziM/7/2WgxbxZ
	ufrODc8qCmLfX1CoN5VfKKp87N+ijKdQRttXIvstWHScbu186D1I27mxGlaVmEBt
	Z2sCYHc5d1MJ3g=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:from:to:cc:subject:date:message-id:references
	:in-reply-to:content-type:content-transfer-encoding
	:mime-version; s=default; bh=xKz7ob24o+5wIn1bmGxsI1VxajQ=; b=h7s
	HNiM+lyvYRQy3jGCvP5q/WKPnF24N0SllgyEbGRkdD+O9p0l82DAJmjqjeLIyDWv
	BkYXbyaRULKTP6xAEnxYgEipqpPHe5eVQXmiGTKrj3OkFcmALxHhPQkw67TGoURF
	GPaJvNK8ijXM2Z8gYk++ipYHG8BJF7jMcc8+CeKY=
Received: (qmail 122741 invoked by alias); 15 Apr 2019 18:02:32 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-e=80x24.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 122677 invoked by uid 89); 15 Apr 2019 18:02:32 -0000
Authentication-Results: sourceware.org; auth=none
X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
 s=selector1-arm-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=ial5G4TAHGEfybRJDF7lfFYYq1vTeY0UDg3hIPVt/6c=;
 b=aCJQgj38WjL9XFPBk6MCb1nFgxTRfAwYxOA+RRLawhmTXwRTA2GfMIlsLz7mqcYJ+TNNq+FBEYMKd04LVB+miYjI4Xd4GS9Q10AxeaEbuVQ0O8lmZjhDnDk9nPiNNks0SsYlQ5ewF5kh10tWZEZ8QZZ151A5DG8Gevs+Tj0M3Qo=
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: Rich Felker <dalias@libc.org>
CC: Szabolcs Nagy <Szabolcs.Nagy@arm.com>, 'GNU C Library'
	<libc-alpha@sourceware.org>, nd <nd@arm.com>
Subject: Re: [PATCH v2] Improve performance of strstr
Date: Mon, 15 Apr 2019 18:02:27 +0000
Message-ID:
 <AM6PR08MB5078A8FE8466F406C3BCD11B832B0@AM6PR08MB5078.eurprd08.prod.outlook.com>
References:
 <DB5PR08MB1030CDDFA10DCA6E414A844683B20@DB5PR08MB1030.eurprd08.prod.outlook.com>
 <DB5PR08MB1030F31182F495BA589BAF9A83920@DB5PR08MB1030.eurprd08.prod.outlook.com>
 <DB5PR08MB1030487C1795F6E4E018BA5383730@DB5PR08MB1030.eurprd08.prod.outlook.com>
 <AM6PR08MB50781FE6FB08D76A2035032583470@AM6PR08MB5078.eurprd08.prod.outlook.com>
 <AM6PR08MB507809D6436B43083AF8306883280@AM6PR08MB5078.eurprd08.prod.outlook.com>
 <2b27b4ca-0370-442d-9f39-210265f00444@arm.com>
 <AM6PR08MB507895A65E1F0EE5780D39C983280@AM6PR08MB5078.eurprd08.prod.outlook.com>
 <20190412171613.GB23599@brightrain.aerifal.cx>
 <AM6PR08MB50783C3A091B45F16965261E832B0@AM6PR08MB5078.eurprd08.prod.outlook.com>,<20190415144051.GE23599@brightrain.aerifal.cx>
In-Reply-To: <20190415144051.GE23599@brightrain.aerifal.cx>
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=Wilco.Dijkstra@arm.com; 
received-spf: None (protection.outlook.com: arm.com does not designate
 permitted sender hosts)
x-ms-exchange-senderadcheck: 1
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED

Hi Rich,

>> The O notation means O (256 N) is equivalent to O (N), ie. linear
>> performance in the worst case. And that is what matters for large inputs=
.
>
> I know what big O notation means. My point is that 256 is a
> sufficiently large number that, for practical purposes, the presence
> or absence of the M factor (whose value can be up to 256) in O(N) vs
> O(MN) is more significant than any small constant factors.

No, you can't gloss over one constant factor but emphasise another.
The constant factor is much smaller in my algorithm and actually gets
better (relatively) when you encounter bad cases. As a result worst
cases perform about the same even with needles of 256 characters.

>> The average runtime of my algorithm is actually sublinear. So in that
>> sense Two-Way performs pathologically bad.
>
> For strstr, it's never sublinear because it operates on
> null-terminated strings (but the linear term can have a very small
> constant because of fast incremental strnlen-like operation, if you
> want).=20

strstr requires strnlen of course but the main algorithm is still sublinear=
.
As a result performance improves with larger needles and the relative
amount of time spent in strnlen increases. Asymptotically it ends up
as fast as strnlen.

> For memmem, of course it can be sublinear. The widely used
> variant of twoway (used in both glibc and musl) does the jump table
> that makes it sublinear (or for strstr, smaller linear-term
> coefficient) already. If you're this unaware of what the current code
> is doing, you shouldn't be trying to replace it with something naive
> but rather trying to understand it and if/how it can be improved.

Well if you've actually seen the existing code, you'd know that it does
not use a skip table for all common cases. And when it uses one, it
actually slows down the worst case by an order of magnitude.
As a result Two-Way performs absolutely abysmally both in common
cases as well as worst cases. It's almost as if people designed it to be
as bad as possible!

Eg. normalised performance from GLIBC bench-strstr test:

                                          basic_strstr    twoway_strstr   s=
trstr(new)
Length 65536/ 16, alignment  1/11, found: 3.07            1.0             1=
1.0

Yes, twoway_strstr is 3 times SLOWER than a trivial quadratic loop...

>> Yes, without a reproducible example I can't see what your issue is. You
>> can't make it go quadratic because it simply isn't.
>
> Obviously it's not unbounded because you have a (very large) bound on
> the size, 256. I can make it do a 256-byte strcmp for nearly every
> byte of the input haystack. Maybe because of vectorization on some
> targets that's only 16x slower than the current code rather than 256x
> slower, but it's still a lot slower.

No you can't. It's impossible to make it do a full 256 byte memcmp every
character. And bad cases are not bad because of the time spent comparing
strings - they are bad because of mispredicted branches. So it's not possib=
le
to compare bad cases without benchmarking actual examples on modern
CPUs.

Wilco=