From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=3.0 tests=AWL,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by dcvr.yhbt.net (Postfix) with ESMTP id 9DCC61F66E for ; Tue, 11 Aug 2020 15:16:19 +0000 (UTC) Received: from neon.ruby-lang.org (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id 6FEB61209E8; Wed, 12 Aug 2020 00:15:48 +0900 (JST) Received: from xtrwkhkc.outbound-mail.sendgrid.net (xtrwkhkc.outbound-mail.sendgrid.net [167.89.16.28]) by neon.ruby-lang.org (Postfix) with ESMTPS id 826C91209E7 for ; Wed, 12 Aug 2020 00:15:45 +0900 (JST) Received: by filterdrecv-p3las1-559bd7b968-qsv2g with SMTP id filterdrecv-p3las1-559bd7b968-qsv2g-19-5F32B63D-E4 2020-08-11 15:16:13.792965472 +0000 UTC m=+1112409.040785426 Received: from herokuapp.com (unknown) by ismtpd0044p1mdw1.sendgrid.net (SG) with ESMTP id z5ouvtYTRS6dkJAJQc_3Ow for ; Tue, 11 Aug 2020 15:16:13.653 +0000 (UTC) Date: Tue, 11 Aug 2020 15:16:13 +0000 (UTC) From: jean.boussier@gmail.com Message-ID: References: Mime-Version: 1.0 X-Redmine-MailingListIntegration-Message-Ids: 75384 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Feature X-Redmine-Issue-Id: 17115 X-Redmine-Issue-Author: byroot X-Redmine-Sender: byroot X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-SG-EID: =?us-ascii?Q?AchqQMoUBMcQgz7gop0XiYUiatGIY7E61JGsTL4FvjeaE0X=2FqKPh+y9+xM=2FbSh?= =?us-ascii?Q?CWQ7MLUUBb7lf2sGABPW9znBg1FMj8mfZ9UmxuO?= =?us-ascii?Q?q2oIh=2FdtYvqwnRXZ+BDEUN7R4mXsnQD764=2FSMcn?= =?us-ascii?Q?pwAVuo03O=2FpMtCNereZu0pMO7VPnJo+FDUE=2F1E3?= =?us-ascii?Q?Q33sI6CmTS=2F6Frv14U64BaPKoNLu9uGrXEhJkD2?= =?us-ascii?Q?RnOAeznqU2QoI8VZI=3D?= To: ruby-core@ruby-lang.org X-ML-Name: ruby-core X-Mail-Count: 99559 Subject: [ruby-core:99559] [Ruby master Feature#17115] Optimize String#casecmp? for ASCII strings X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" Issue #17115 has been reported by byroot (Jean Boussier). ---------------------------------------- Feature #17115: Optimize String#casecmp? for ASCII strings https://bugs.ruby-lang.org/issues/17115 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- Patch: https://github.com/ruby/ruby/pull/3369 `casecmp?` is kind of a performance trap as it's much slower than using a c= ase insensitive regexp or just `casecmp =3D=3D 0`. ``` str =3D "Connection" cmp =3D "connection" Benchmark.ips do |x| x.report('/\A\z/i.match?') { /\Afoo\Z/i.match?(str) } x.report('casecmp?') { cmp.casecmp?(str) } x.report('casecmp') { cmp.casecmp(str) =3D=3D 0 } x.compare! end Calculating ------------------------------------- /\A\z/i.match? 11.447M (=B1 1.3%) i/s - 57.814M in 5.051489s casecmp? 6.197M (=B1 0.9%) i/s - 31.138M in 5.025252s casecmp 12.753M (=B1 1.2%) i/s - 64.636M in 5.069195s Comparison: casecmp: 12752791.6 i/s /\A\z/i.match?: 11446996.1 i/s - 1.11x (=B1 0.00) slower casecmp?: 6196886.0 i/s - 2.06x (=B1 0.00) slower ``` This is because contrary to the others it tries to be correct in regards to= unicode case folding. However there are cases where fast case insentive equality check of known A= SCII strings is useful. For instance for matching HTTP headers. This patch check if both strings use a single byte encoding, and if so then= delegate most of the work to strncasecmp(3) This makes casecmp? sligthly faster than `casecmp =3D=3D 0` when both strin= gs are ASCII. ``` | |compare-ruby|built-ruby| |:-----------------------|-----------:|---------:| |casecmp-1 | 11.618M| 10.757M| | | 1.08x| -| |casecmp-10 | 1.849M| 1.723M| | | 1.07x| -| |casecmp-100 | 204.490k| 186.798k| | | 1.09x| -| |casecmp-1000 | 20.413k| 20.184k| | | 1.01x| -| |casecmp-nonascii1 | 19.541M| 20.100M| | | -| 1.03x| |casecmp-nonascii10 | 19.489M| 19.914M| | | -| 1.02x| |casecmp-nonascii100 | 19.479M| 20.155M| | | -| 1.03x| |casecmp-nonascii1000 | 19.462M| 20.064M| | | -| 1.03x| |casecmp_p-1 | 2.214M| 12.030M| | | -| 5.43x| |casecmp_p-10 | 1.373M| 2.150M| | | -| 1.57x| |casecmp_p-100 | 249.292k| 231.041k| | | 1.08x| -| |casecmp_p-1000 | 16.173k| 23.592k| | | -| 1.46x| |casecmp_p-nonascii1 | 651.921k| 650.572k| | | 1.00x| -| |casecmp_p-nonascii10 | 108.253k| 109.006k| | | -| 1.01x| |casecmp_p-nonascii100 | 11.749k| 11.889k| | | -| 1.01x| |casecmp_p-nonascii1000 | 1.140k| 1.138k| | = ``` -- = https://bugs.ruby-lang.org/ Unsubscribe: