From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=0.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_BL_SPAMCOP_NET,SPF_HELO_PASS, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [94.130.110.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 0BB8E1F44D for ; Tue, 9 Apr 2024 13:10:24 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; secure) header.d=ml.ruby-lang.org header.i=@ml.ruby-lang.org header.a=rsa-sha256 header.s=mail header.b=ObGIvLLc; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=GRhAt5QV; dkim-atps=neutral Received: from nue.mailmanlists.eu (localhost [127.0.0.1]) by nue.mailmanlists.eu (Postfix) with ESMTP id 1FDCD8420B; Tue, 9 Apr 2024 13:10:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ml.ruby-lang.org; s=mail; t=1712668216; bh=3Bpn0XCqXaZX/6yPrcuYwZ/bQZ2HPuczp9BKlkS8uI0=; h=Date:References:To:Reply-To:Subject:List-Id:List-Archive: List-Help:List-Owner:List-Post:List-Subscribe:List-Unsubscribe: From:Cc:From; b=ObGIvLLcspvJo/tW6/w2OtQXKugzaAuCyw4sE+Nlh8as1p/RvbD+obf5xlh0Ac4ke zGk56Kvgl/3sERsGxx7QMaGMd2rheB1vcLTLNeh6fpJ7iTq8c8Si+D4cDDLVSxKakv Q+Vs/+V4+62xUT4TFKfuzxb5cUKNVeHnsHeOqwNg= Received: from s.wrqvtvvn.outbound-mail.sendgrid.net (s.wrqvtvvn.outbound-mail.sendgrid.net [149.72.120.130]) by nue.mailmanlists.eu (Postfix) with ESMTPS id C756981399 for ; Tue, 9 Apr 2024 13:10:12 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=pass (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=GRhAt5QV; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:from:subject:to; s=s1; bh=IQtarJT/59/F6OCeiEyBWhQCGDRlcaougejOLjQjdac=; b=GRhAt5QVFpOFPsAGZAZWCX9XaCx0nWz/TbqLrNdMUZiF8SfXma+5biUiJzyL+M9f5igl ubGgncMGdrMBvJ2/BxDccFDcxu0/tsRt4lRQdN6C/eptwU4+96DiOhhxQyURRjYFD3lPMV lxs8tH1HDMO9gdXoqt2l3mJ9n3xIw5WWE0K3IBf8VsD6dD5Px5mqtcVS6g40SCDQXUETCU 5G7a5peZnPyu2ZlKyYiVIXsL6WESsTC9zirG2JN5wbGeHzil2u/xLNDxhlW8JwOTqhhcLC m4GCEO7C635x6+9rHtT24vJzAEHsiLain0sVcGwxui6DkojwxLD76wD2iOmAiTlA== Received: by recvd-57769bb547-jgrlh with SMTP id recvd-57769bb547-jgrlh-1-66153E33-A 2024-04-09 13:10:11.280123519 +0000 UTC m=+1872496.148869636 Received: from herokuapp.com (unknown) by geopod-ismtpd-27 (SG) with ESMTP id jrilGX9_QxeiIXaOHPwvFg for ; Tue, 09 Apr 2024 13:10:11.219 +0000 (UTC) Date: Tue, 09 Apr 2024 13:10:11 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Feature X-Redmine-Issue-Id: 20415 X-Redmine-Issue-Author: byroot X-Redmine-Issue-Priority: Normal X-Redmine-Sender: byroot X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 94068 X-SG-EID: =?us-ascii?Q?u001=2EKmNZ1u3n1vIpO8NNTdp+Q9c0ai7potxbEDLMO7SOJO=2F4KkRUz0d23466m?= =?us-ascii?Q?naiq=2F5fmA4hb60MdRMUAwHZnjIWVFu=2FrqiBOz5c?= =?us-ascii?Q?nOvkBudsSTidkTBFLBbIreHiUG0zKhlFP4BgSsy?= =?us-ascii?Q?XQNvanwuBqKac+UKvQmr2Zp+8eYKVLVYibhaPx=2F?= =?us-ascii?Q?gL6=2F0su+L65SfVQxH4GD07Z5zD=2F8kZB5B9vuBDW?= =?us-ascii?Q?3rokPcMykBGm7AAGYQUU55C1lRDSui3Dno9RsqI?= =?us-ascii?Q?hTu=2FkgTCy2bw6+5SbLBJAv6xFg=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: u001.I8uzylDtAfgbeCOeLBYDww== Message-ID-Hash: 2LP566HE2YJ72UVKPDUQWJ47VBBIAGAE X-Message-ID-Hash: 2LP566HE2YJ72UVKPDUQWJ47VBBIAGAE X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:117473] [Ruby master Feature#20415] Precompute literal String hash code during compilation List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "byroot (Jean Boussier) via ruby-core" Cc: "byroot (Jean Boussier)" Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Issue #20415 has been updated by byroot (Jean Boussier). > if two Strings are interned it's not correct to compare them by identity,= because they could still be eql? with the same bytes but different encodin= gs.=20 I'm not sure I follow. Surely `x.eql?(x)` can use an identity check as a sh= ortcut. If both refs are equal, you can immediately return true. Of course = if refs aren't equal, then yes you need to do a full string comparison. So I'm a bit surprised to see TruffleRuby doesn't have the same performance= on that benchmark. ---------------------------------------- Feature #20415: Precompute literal String hash code during compilation https://bugs.ruby-lang.org/issues/20415#change-107861 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- I worked on a proof of concept with @etienne which I think has some potenti= al, but I'm looking for feedback on what would be the best implementation. The proof of concept is here: https://github.com/Shopify/ruby/pull/553 ### Idea Most string literals are relatively short, hence embedded, and have some wa= sted bytes at the end of their slot. We could use that wasted space to stor= e the string hash. The goal being to make **looking up a literal String key in a hash, as fast= as a Symbol key**. The goal isn't to memoize the hash code of all strings,= but to **only selectively precompute the hash code of literal strings in the compiler**. The compiler could even selectively do this when we lite= ral string is used to lookup a hash (`opt_aref`). Here's the benchmark we used: ```ruby hash =3D 10.times.to_h do |i| [i, i] end dyn_sym =3D "dynamic_symbol".to_sym hash[:some_symbol] =3D 1 hash[dyn_sym] =3D 1 hash["small"] =3D 2 hash["frozen_string_literal"] =3D 2 Benchmark.ips do |x| x.report("symbol") { hash[:some_symbol] } x.report("dyn_symbol") { hash[:some_symbol] } x.report("small_lit") { hash["small"] } x.report("frozen_lit") { hash["frozen_string_literal"] } x.compare!(order: :baseline) end ``` On Ruby 3.3.0, looking up a String key is a bit slower based on the key siz= e: ``` Calculating ------------------------------------- symbol 24.175M (=B1 1.7%) i/s - 122.002M in 5.048306s dyn_symbol 24.345M (=B1 1.6%) i/s - 122.019M in 5.013400s small_lit 21.252M (=B1 2.1%) i/s - 107.744M in 5.072042s frozen_lit 20.095M (=B1 1.3%) i/s - 100.489M in 5.001681s Comparison: symbol: 24174848.1 i/s dyn_symbol: 24345476.9 i/s - same-ish: difference falls within er= ror small_lit: 21252403.2 i/s - 1.14x slower frozen_lit: 20094766.0 i/s - 1.20x slower ``` With the proof of concept performance is pretty much identical: ``` Calculating ------------------------------------- symbol 23.528M (=B1 6.9%) i/s - 117.584M in 5.033231s dyn_symbol 23.777M (=B1 4.7%) i/s - 120.231M in 5.071734s small_lit 23.066M (=B1 2.9%) i/s - 115.376M in 5.006947s frozen_lit 22.729M (=B1 1.1%) i/s - 115.693M in 5.090700s Comparison: symbol: 23527823.6 i/s dyn_symbol: 23776757.8 i/s - same-ish: difference falls within er= ror small_lit: 23065535.3 i/s - same-ish: difference falls within er= ror frozen_lit: 22729351.6 i/s - same-ish: difference falls within er= ror ``` ### Possible implementation The reason I'm opening this issue early is to get feedback on which would b= e the best implementation. #### Store hashcode after the string terminator Right now the proof of concept simply stores the `st_index_t` after the str= ing null terminator, and only when the string is embedded and as enough lef= t over space. Strings with a precomputed hash are marked with an user flag. Pros: - Very simple implementation, no need to change a lot of code, and very e= asy to strip out if we want to. - Doesn't use any extra memory. If the string doesn't have enough left ov= er bytes, the optimization simply isn't applied. - The worst case overhead is a single `FL_TEST_RAW` in `rb_str_hash`. Cons: - The optimization won't apply to certain string sizes. e.g. strings betw= een `17` and `23` bytes won't have a precomputed hash code. - Extracting the hash code requires some not so nice pointer arithmetic. #### Create another RString union Another possibility would be to add another entry in the `RString` struct u= nion, such as we'd have: ```c struct RString { struct RBasic basic; long len; union { // ... existing members struct { st_index_t hash; char ary[1]; } embded_literal; } as; }; ``` Pros: - The optimization can now be applied to all string sizes. - The hashcode is always at the same offset and properly aligned. Cons: - Some strings would be bumped by one slot size, so would use marginally = more memory. - Complexify the code base more, need to modify a lot more string related= code (e.g. `RSTRING_PTR` and many others) - When compiling such string, if an equal string already exists in the `f= string` table, we'd need to replace it, we can't just mutate it in place to= add the hashcode. ### Prior art [Feature #15331] is somewhat similar in its idea, but it does it lazily for= all strings. Here it's much simpler because limited to string literals, wh= ich are the ones likely to be used as Hash keys, and the overhead is on com= pilation, not runtime (aside from a single flag check). So I think most of = the caveats of that original implementation don't apply here. --=20 https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c= ore.ml.ruby-lang.org/