Re: [PATCH v2] Improve performance of strstr

unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Rich Felker <dalias@libc.org>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>,
	Zack Weinberg <zackw@panix.com>, nd <nd@arm.com>,
	GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH v2] Improve performance of strstr
Date: Tue, 16 Apr 2019 22:22:05 -0400	[thread overview]
Message-ID: <20190417022205.GJ23599@brightrain.aerifal.cx> (raw)
In-Reply-To: <AM6PR08MB5078A16246C1B19D8E19EAEE83250@AM6PR08MB5078.eurprd08.prod.outlook.com>

On Wed, Apr 17, 2019 at 12:54:05AM +0000, Wilco Dijkstra wrote:
> Hi Rich,
> 
> On Tue, Apr 16, 2019 at 11:49:43PM +0000, Wilco Dijkstra wrote:
> > Hi Rich,
> > 
> > > Ultimately the 3 key questions are:
> > > 
> > > 1. Is it linear time even on huge inputs?
> > > 2. Is it faster than Two-way in the average case?
> > > 3. Is it within a small factor even in the worst case?
> > > 
> > > The first 2 were never in doubt, and your result proves 3 as well.
> > 
> > > His result was up to 40x slower, which refutes #3.
> > 
> > No that was incorrectly comparing a fast case with a slow case. You can
> > easily show 1000x difference between fast and slow cases, but what we're
> > interested in is how the slow cases compare.
> 
> > No, the 40x was comparing the same input -- that is, there's an input
> > for which your patch makes it 40x slower.
> 
> The same input can be fast for one algorithm and slow on another. Is that
> hard to understand? Like I said it can be 1000 times. And it goes both ways
> so it doesn't matter at all.
> 
> > You can find some inputs which are mildly slow for the current two-way
> > implementation (likely due to a mix of fundamental reasons and
> > implementation flaws), and normalized for size (a poor normalization
> > but whatever), searching for them takes significantly more than 1/40
> > the time that your code takes on its worst-case. This is really not a
> 
> Yes it's very common to get bad factorizations, but every now and again it
> gets one that starts with an uncommon character and it ends up really fast.
> That huge disparity means it's a bad algorithm. 

No it doesn't. It's unfortunate that the factorization depends on the
choice of order on the alphabet. This can be mitigated by ordering the
alphabet according to first or last appearance in the needle, and
indeed doing so tends to improve things. But even without that, the
difference between a "good" and "bad" choice is small. When it's not
small, the cause is inefficient code paths in the implementation, not
anything fundamental to the algorithm.

> Yes the implementation is insanely stupid with the large and small needle
> code reversed. For small needles you simply want to use a skip table since
> factorization can't do anything useful anyway.

Arguably, on an arch with vector registers, this is roughly correct,
for glibc's large/small cutoff of 32 bytes. However keep in mind that
in a worst case the skip table is always useless. The way you make an
efficient strstr for needles that fit in registers is just rotating in
a byte at a time and comparing the whole register. I'm not sure if you
can tack the skip table onto this in a way that works well.

> For large needles you can get
> reasonable factorizations and so a skip table actually slows things down.

Unfortunately the memset to make the skip table is somewhat costly for
small needles, in terms of blowing cache lines even if not raw cycles
under an artificial benchmark. musl's version mitigates this with a
bit array marking the valid entries rather than initializing them all,
but it might be better (if you don't mind code size, which glibc
doesn't) to have a different version for needles <256 bytes in which
each skip entry is a single byte.

> > meaningful comparison. What's meaningful is that some operations
> > become 40x slower for no good reason except your irrational refusal to
> > improve the implementation of the good algorithm rather than reverting
> > to something backwards.
> 
> A 40x slowdown vs a rare lucky fast case for Two-way is not in any way

It's not a "lucky" fast case. It's the classic pattern of cases that
are awful for O(nm) algorithms. That a non-idiotic algorithm isn't
slow on this is not luck. It's the whole point.

> meaningful. Being 11 times faster than the optimized Two-way on the
> average case certainly is.

It's not 11x faster, and the two-way in glibc is not at all optimized,
but does lots of things in glaringly, gratuitously slow ways.

Rich

next prev parent reply	other threads:[~2019-04-17  2:22 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-31 21:01 [PATCH v2] Improve performance of strstr Wilco Dijkstra
2018-12-31 21:24 ` Rich Felker
2019-01-02 13:40   ` Wilco Dijkstra
2019-02-01 15:40 ` Wilco Dijkstra
2019-03-06 16:55   ` Wilco Dijkstra
2019-03-18 17:17     ` Wilco Dijkstra
2019-04-12 15:50       ` Wilco Dijkstra
2019-04-12 16:19         ` Szabolcs Nagy
2019-04-12 16:49           ` Wilco Dijkstra
2019-04-12 17:16             ` Rich Felker
2019-04-15 10:24               ` Wilco Dijkstra
2019-04-15 14:40                 ` Rich Felker
2019-04-15 18:02                   ` Wilco Dijkstra
2019-04-15 20:15                     ` Zack Weinberg
2019-04-16  0:13                       ` Rich Felker
2019-04-16 16:40                       ` Szabolcs Nagy
2019-04-16 17:01                         ` Szabolcs Nagy
2019-04-16 18:07                           ` Szabolcs Nagy
2019-04-16 20:56                             ` Rich Felker
2019-04-17 14:24                               ` Szabolcs Nagy
2019-04-17 15:10                                 ` Rich Felker
2019-04-18 13:51                                 ` Wilco Dijkstra
2019-04-18 15:20                                   ` Rich Felker
2019-04-18 16:16                                     ` Szabolcs Nagy
2019-04-18 16:37                                       ` Wilco Dijkstra
2019-04-18 17:02                                         ` Rich Felker
2019-04-16 22:52                             ` Wilco Dijkstra
2019-04-16 23:40                               ` Rich Felker
2019-04-16 23:49                                 ` Wilco Dijkstra
2019-04-17  0:37                                   ` Rich Felker
2019-04-17  0:54                                     ` Wilco Dijkstra
2019-04-17  2:22                                       ` Rich Felker [this message]
2019-04-17 15:06                               ` Wilco Dijkstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190417022205.GJ23599@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=Szabolcs.Nagy@arm.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=libc-alpha@sourceware.org \
    --cc=nd@arm.com \
    --cc=zackw@panix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).