ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:99559] [Ruby master Feature#17115] Optimize String#casecmp? for ASCII strings
@ 2020-08-11 15:16 jean.boussier
  2020-08-14 15:14 ` [ruby-core:99588] " daniel
  0 siblings, 1 reply; 2+ messages in thread
From: jean.boussier @ 2020-08-11 15:16 UTC (permalink / raw)
  To: ruby-core

Issue #17115 has been reported by byroot (Jean Boussier).

----------------------------------------
Feature #17115: Optimize String#casecmp? for ASCII strings
https://bugs.ruby-lang.org/issues/17115

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
Patch: https://github.com/ruby/ruby/pull/3369

`casecmp?` is kind of a performance trap as it's much slower than using a case insensitive regexp or just `casecmp == 0`.

```
str = "Connection"
cmp = "connection"
Benchmark.ips do |x|
  x.report('/\A\z/i.match?') { /\Afoo\Z/i.match?(str) }
  x.report('casecmp?') { cmp.casecmp?(str) }
  x.report('casecmp') { cmp.casecmp(str) == 0 }
  x.compare!
end
Calculating -------------------------------------
      /\A\z/i.match?     11.447M (± 1.3%) i/s -     57.814M in   5.051489s
            casecmp?      6.197M (± 0.9%) i/s -     31.138M in   5.025252s
             casecmp     12.753M (± 1.2%) i/s -     64.636M in   5.069195s

Comparison:
             casecmp: 12752791.6 i/s
      /\A\z/i.match?: 11446996.1 i/s - 1.11x  (± 0.00) slower
            casecmp?:  6196886.0 i/s - 2.06x  (± 0.00) slower
```

This is because contrary to the others it tries to be correct in regards to unicode case folding.

However there are cases where fast case insentive equality check of known ASCII strings is useful. For instance for matching HTTP headers.

This patch check if both strings use a single byte encoding, and if so then delegate most of the work to strncasecmp(3)

This makes casecmp? sligthly faster than `casecmp == 0` when both strings are ASCII.

```
|                        |compare-ruby|built-ruby|
|:-----------------------|-----------:|---------:|
|casecmp-1               |     11.618M|   10.757M|
|                        |       1.08x|         -|
|casecmp-10              |      1.849M|    1.723M|
|                        |       1.07x|         -|
|casecmp-100             |    204.490k|  186.798k|
|                        |       1.09x|         -|
|casecmp-1000            |     20.413k|   20.184k|
|                        |       1.01x|         -|
|casecmp-nonascii1       |     19.541M|   20.100M|
|                        |           -|     1.03x|
|casecmp-nonascii10      |     19.489M|   19.914M|
|                        |           -|     1.02x|
|casecmp-nonascii100     |     19.479M|   20.155M|
|                        |           -|     1.03x|
|casecmp-nonascii1000    |     19.462M|   20.064M|
|                        |           -|     1.03x|
|casecmp_p-1             |      2.214M|   12.030M|
|                        |           -|     5.43x|
|casecmp_p-10            |      1.373M|    2.150M|
|                        |           -|     1.57x|
|casecmp_p-100           |    249.292k|  231.041k|
|                        |       1.08x|         -|
|casecmp_p-1000          |     16.173k|   23.592k|
|                        |           -|     1.46x|
|casecmp_p-nonascii1     |    651.921k|  650.572k|
|                        |       1.00x|         -|
|casecmp_p-nonascii10    |    108.253k|  109.006k|
|                        |           -|     1.01x|
|casecmp_p-nonascii100   |     11.749k|   11.889k|
|                        |           -|     1.01x|
|casecmp_p-nonascii1000  |      1.140k|    1.138k|
|  
```



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [ruby-core:99588] [Ruby master Feature#17115] Optimize String#casecmp? for ASCII strings
  2020-08-11 15:16 [ruby-core:99559] [Ruby master Feature#17115] Optimize String#casecmp? for ASCII strings jean.boussier
@ 2020-08-14 15:14 ` daniel
  0 siblings, 0 replies; 2+ messages in thread
From: daniel @ 2020-08-14 15:14 UTC (permalink / raw)
  To: ruby-core

Issue #17115 has been updated by Dan0042 (Daniel DeLorme).


In the benchmark you'd need to change the regexp from `/\Afoo\Z/i` to `/\Aconnection\z/i`; if you do so you'll find the regexp performance is similar to `casecmp?`

+1 for special-casing ASCII strings though.

Related: #13750, #14055

----------------------------------------
Feature #17115: Optimize String#casecmp? for ASCII strings
https://bugs.ruby-lang.org/issues/17115#change-87065

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
Patch: https://github.com/ruby/ruby/pull/3369

`casecmp?` is kind of a performance trap as it's much slower than using a case insensitive regexp or just `casecmp == 0`.

```
str = "Connection"
cmp = "connection"
Benchmark.ips do |x|
  x.report('/\A\z/i.match?') { /\Afoo\Z/i.match?(str) }
  x.report('casecmp?') { cmp.casecmp?(str) }
  x.report('casecmp') { cmp.casecmp(str) == 0 }
  x.compare!
end
Calculating -------------------------------------
      /\A\z/i.match?     11.447M (± 1.3%) i/s -     57.814M in   5.051489s
            casecmp?      6.197M (± 0.9%) i/s -     31.138M in   5.025252s
             casecmp     12.753M (± 1.2%) i/s -     64.636M in   5.069195s

Comparison:
             casecmp: 12752791.6 i/s
      /\A\z/i.match?: 11446996.1 i/s - 1.11x  (± 0.00) slower
            casecmp?:  6196886.0 i/s - 2.06x  (± 0.00) slower
```

This is because contrary to the others it tries to be correct in regards to unicode case folding.

However there are cases where fast case insentive equality check of known ASCII strings is useful. For instance for matching HTTP headers.

This patch check if both strings use a single byte encoding, and if so then do a simple iterative comparison with `TOLOWER()`.

This makes casecmp? sligthly faster than `casecmp == 0` when both strings are ASCII.

```
|                        |compare-ruby|built-ruby|
|:-----------------------|-----------:|---------:|
|casecmp-1               |     11.618M|   10.757M|
|                        |       1.08x|         -|
|casecmp-10              |      1.849M|    1.723M|
|                        |       1.07x|         -|
|casecmp-100             |    204.490k|  186.798k|
|                        |       1.09x|         -|
|casecmp-1000            |     20.413k|   20.184k|
|                        |       1.01x|         -|
|casecmp-nonascii1       |     19.541M|   20.100M|
|                        |           -|     1.03x|
|casecmp-nonascii10      |     19.489M|   19.914M|
|                        |           -|     1.02x|
|casecmp-nonascii100     |     19.479M|   20.155M|
|                        |           -|     1.03x|
|casecmp-nonascii1000    |     19.462M|   20.064M|
|                        |           -|     1.03x|
|casecmp_p-1             |      2.214M|   12.030M|
|                        |           -|     5.43x|
|casecmp_p-10            |      1.373M|    2.150M|
|                        |           -|     1.57x|
|casecmp_p-100           |    249.292k|  231.041k|
|                        |       1.08x|         -|
|casecmp_p-1000          |     16.173k|   23.592k|
|                        |           -|     1.46x|
|casecmp_p-nonascii1     |    651.921k|  650.572k|
|                        |       1.00x|         -|
|casecmp_p-nonascii10    |    108.253k|  109.006k|
|                        |           -|     1.01x|
|casecmp_p-nonascii100   |     11.749k|   11.889k|
|                        |           -|     1.01x|
|casecmp_p-nonascii1000  |      1.140k|    1.138k|
|  
```



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-08-14 15:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-11 15:16 [ruby-core:99559] [Ruby master Feature#17115] Optimize String#casecmp? for ASCII strings jean.boussier
2020-08-14 15:14 ` [ruby-core:99588] " daniel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).