Re: [PATCH v2] Improve performance of strstr

unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
To: Zack Weinberg <zackw@panix.com>, Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: nd <nd@arm.com>, Rich Felker <dalias@libc.org>,
	GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH v2] Improve performance of strstr
Date: Tue, 16 Apr 2019 18:07:37 +0000	[thread overview]
Message-ID: <e9fe9dd8-8204-311b-9001-9e50d840ae1b@arm.com> (raw)
In-Reply-To: <953ad13a-cd71-32c4-ccc3-d1b8cac57620@arm.com>

On 16/04/2019 18:01, Szabolcs Nagy wrote:
> On 16/04/2019 17:40, Szabolcs Nagy wrote:
>> On 15/04/2019 21:15, Zack Weinberg wrote:
>>> On Mon, Apr 15, 2019 at 2:02 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>>>> Hi Rich,
>>> ...
>>>>>> Yes, without a reproducible example I can't see what your issue is. You
>>>>>> can't make it go quadratic because it simply isn't.
>>>>>
>>>>> Obviously it's not unbounded because you have a (very large) bound on
>>>>> the size, 256. I can make it do a 256-byte strcmp for nearly every
>>>>> byte of the input haystack. Maybe because of vectorization on some
>>>>> targets that's only 16x slower than the current code rather than 256x
>>>>> slower, but it's still a lot slower.
>>>>
>>>> No you can't. It's impossible to make it do a full 256 byte memcmp every
>>>> character. And bad cases are not bad because of the time spent comparing
>>>> strings - they are bad because of mispredicted branches. So it's not possible
>>>> to compare bad cases without benchmarking actual examples on modern
>>>> CPUs.
>>>
>>> This discussion has been going in circles for quite some time now.
>>>
>>> Wilco, Rich, I think it would help a lot if you could BOTH write down
>>> some example needle and haystack pairs that you believe will
>>> demonstrate significantly improved performance with your preferred
>>> algorithm, and/or pathologically slow performance with your
>>> dispreferred algorithm.  Even without specific numbers, that will give
>>> everyone something concrete to argue over, at least.
>>
>> since the new algorithm tries to look at the last two chars first
>> i'd say a possible bad case for it is
>>
>> needle   =  250*"a" + "abbbaa"
>> haystack = (250*"a" + "bbbbaa") * 1000
>>
>> (256 is the longest needle for the new algo, checking in a 256k
>> haystack should be large enough to see matching performance
>> instead of setup time)
>>
>> i think this should be close to worst case, but i haven't done
>> a detailed analysis, the regression with the new algorithm is
>>
>> 16.0x slower on Cortex-A72
>> 17.8x slower on Cortex-A57
> 
> scratch that, with
> 
> needle   =  248*"a" + "abbbbaaa"
> haystack = (248*"a" + "bbbbbaaa") * 1000
> 
> i get
> 
> 37.8x slower on Cortex-A72
> 40.0x slower on Cortex-A57

this is a good case for twoway, so we need a twoway worst case too
for a real comparision: comparing the worst for new vs worst for
twoway i've seen so far, new is

4.5x slower on Cortex-A72
2.7x slower on Cortex-A57

but there is no guarantee that my inputs are near the real worst
cases, it's hard to tell (and clearly uarch matters too).

(i wanted to avoid diving into the algorithm details to find
worst cases)

> 
> i didn't expect such difference compared to the previous case,
> i guess i would have to do a more detailed analysis.
> 
>>
>> (for a fair comparison, the worst-case of the new code should
>> be compared against the worst-case of the old code as well as
>> comparing over common inputs for the average case)
>>
>> (the claimed average case improvement is 3.7x on Cortex-A72)
>>
>> if we are afraid the worst-case will be used for denial of
>> service attack, then i think such slowdown is not enough to
>> cause problems (and requires the control of both haystack and
>> needle).
>>
>> if we are afraid of hitting bad cases in practice, then the
>> heuristic tweaks in the new matching logic should reduce the
>> probability of bad cases with real needle/haystack. (but it
>> does not prevent them, so on interactive systems one might
>> occasionally experience larger latency than before)
>>
>

next prev parent reply	other threads:[~2019-04-16 18:07 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-31 21:01 [PATCH v2] Improve performance of strstr Wilco Dijkstra
2018-12-31 21:24 ` Rich Felker
2019-01-02 13:40   ` Wilco Dijkstra
2019-02-01 15:40 ` Wilco Dijkstra
2019-03-06 16:55   ` Wilco Dijkstra
2019-03-18 17:17     ` Wilco Dijkstra
2019-04-12 15:50       ` Wilco Dijkstra
2019-04-12 16:19         ` Szabolcs Nagy
2019-04-12 16:49           ` Wilco Dijkstra
2019-04-12 17:16             ` Rich Felker
2019-04-15 10:24               ` Wilco Dijkstra
2019-04-15 14:40                 ` Rich Felker
2019-04-15 18:02                   ` Wilco Dijkstra
2019-04-15 20:15                     ` Zack Weinberg
2019-04-16  0:13                       ` Rich Felker
2019-04-16 16:40                       ` Szabolcs Nagy
2019-04-16 17:01                         ` Szabolcs Nagy
2019-04-16 18:07                           ` Szabolcs Nagy [this message]
2019-04-16 20:56                             ` Rich Felker
2019-04-17 14:24                               ` Szabolcs Nagy
2019-04-17 15:10                                 ` Rich Felker
2019-04-18 13:51                                 ` Wilco Dijkstra
2019-04-18 15:20                                   ` Rich Felker
2019-04-18 16:16                                     ` Szabolcs Nagy
2019-04-18 16:37                                       ` Wilco Dijkstra
2019-04-18 17:02                                         ` Rich Felker
2019-04-16 22:52                             ` Wilco Dijkstra
2019-04-16 23:40                               ` Rich Felker
2019-04-16 23:49                                 ` Wilco Dijkstra
2019-04-17  0:37                                   ` Rich Felker
2019-04-17  0:54                                     ` Wilco Dijkstra
2019-04-17  2:22                                       ` Rich Felker
2019-04-17 15:06                               ` Wilco Dijkstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9fe9dd8-8204-311b-9001-9e50d840ae1b@arm.com \
    --to=szabolcs.nagy@arm.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=dalias@libc.org \
    --cc=libc-alpha@sourceware.org \
    --cc=nd@arm.com \
    --cc=zackw@panix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).