From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [IPv6:2a01:4f8:1c0c:6b10::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 2B0F21F44D for ; Tue, 9 Apr 2024 14:23:58 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; secure) header.d=ml.ruby-lang.org header.i=@ml.ruby-lang.org header.a=rsa-sha256 header.s=mail header.b=mqGAhBWI; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=WjZcsdx1; dkim-atps=neutral Received: from nue.mailmanlists.eu (localhost [127.0.0.1]) by nue.mailmanlists.eu (Postfix) with ESMTP id 790A98421B; Tue, 9 Apr 2024 14:23:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ml.ruby-lang.org; s=mail; t=1712672630; bh=xmsnQ6tmbnFV16jbMYXE63dzSrYR7wsyiUw2fQnuRDk=; h=Date:References:To:Reply-To:Subject:List-Id:List-Archive: List-Help:List-Owner:List-Post:List-Subscribe:List-Unsubscribe: From:Cc:From; b=mqGAhBWIZvJBUAfpSrMf3PvT0Rmws2kdqJVQVIMOKZhTuPVQoSMEyBJ1Rwx328Eap HKGG6FvaKZtMST747sJlZv/jSzXgky5mqVho7aWyX7M4UJYd3Q+AtFkn57Q5vAqlen bhze6NEo370ZO6m9Jj8jqjUqrduLpzD301r0UJMs= Received: from s.wfbtzhsw.outbound-mail.sendgrid.net (s.wfbtzhsw.outbound-mail.sendgrid.net [159.183.224.105]) by nue.mailmanlists.eu (Postfix) with ESMTPS id CA07084218 for ; Tue, 9 Apr 2024 14:23:46 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=pass (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=WjZcsdx1; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:from:subject:to; s=s1; bh=1YMmjcPZNZAMn0J+ZICpG1GWxfrIZBx/zCIH2UsBYtI=; b=WjZcsdx1QmCvv2xsaVWHVh4J29Zok4JEZx3edokkuUj29/rEpjYpp4+JgkdZn9rL+95Z 9Q1nfbiH3sOzyVD+6bYwQvWKL6LlVIEiFqOWDXh0lo0LFtsyj1FyAwNLS7A+zRMnpnzjgR wdfGQYVZDyBgWuxjQmAdJz+cUAP0i/CFer76s4QJ8vQIzkNdGse8T/9SvF/K7p2xVMTZxi Sy8AlAio2AwqfqzMzSFKFT/hdcHW20I7osn0aiOK4i53N31wjVwmEH322eMTJH8SsQeTIC 2AuiMqbk5+tn5Qz/M4AmwtPVKX67VAqblH+rvIqpuxe58fc0W0eyDTDQIHrdTMkg== Received: by recvd-7555548f4d-5qgpj with SMTP id recvd-7555548f4d-5qgpj-1-66154F71-F 2024-04-09 14:23:45.769146104 +0000 UTC m=+1877030.131856287 Received: from herokuapp.com (unknown) by geopod-ismtpd-24 (SG) with ESMTP id dq4hf7onR1yt8vEYnrqKLA for ; Tue, 09 Apr 2024 14:23:45.739 +0000 (UTC) Date: Tue, 09 Apr 2024 14:23:45 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Feature X-Redmine-Issue-Id: 20415 X-Redmine-Issue-Author: byroot X-Redmine-Issue-Priority: Normal X-Redmine-Sender: byroot X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 94071 X-SG-EID: =?us-ascii?Q?u001=2EKmNZ1u3n1vIpO8NNTdp+Q9c0ai7potxbEDLMO7SOJO=2F4KkRUz0d23466m?= =?us-ascii?Q?naiq=2F5fmA4hb60MdRMUAwHZnjIWVFu=2FrqiBOz5c?= =?us-ascii?Q?nOvkBudsSTiaih8xsKCNpzHrYKYdQ3dPkFCp5Ss?= =?us-ascii?Q?BO2ZYhPpgzKBpi7Gl8imeqrgzluQwyrDaP0j3Oj?= =?us-ascii?Q?c34bOgF2SJHOYlFL75WRC4dTDOJD6ABHjdQtUYe?= =?us-ascii?Q?eDzJIU2pJT74+CEkyEIrioYJ=2FF=2FctkBJQ=2FJ81bn?= =?us-ascii?Q?V25UQCHiAxhGhGEul+7kj6YgOQ=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: u001.I8uzylDtAfgbeCOeLBYDww== Message-ID-Hash: YKZIFJNH7C2DSTYSTUQXQ5KBSUUA4YTP X-Message-ID-Hash: YKZIFJNH7C2DSTYSTUQXQ5KBSUUA4YTP X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:117476] [Ruby master Feature#20415] Precompute literal String hash code during compilation List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "byroot (Jean Boussier) via ruby-core" Cc: "byroot (Jean Boussier)" Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Issue #20415 has been updated by byroot (Jean Boussier). > Yes, if they are the same object of course they are eql?, but if they are= not the same object they can still be eql?, even if both are interned/fstr= ing/frozen string literals. That was my understanding. > due to using an array instead of buckets until 8 pairs IIRC Yes, that is why my benchmark use a larger Hash. ---------------------------------------- Feature #20415: Precompute literal String hash code during compilation https://bugs.ruby-lang.org/issues/20415#change-107864 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- I worked on a proof of concept with @etienne which I think has some potenti= al, but I'm looking for feedback on what would be the best implementation. The proof of concept is here: https://github.com/Shopify/ruby/pull/553 ### Idea Most string literals are relatively short, hence embedded, and have some wa= sted bytes at the end of their slot. We could use that wasted space to stor= e the string hash. The goal being to make **looking up a literal String key in a hash, as fast= as a Symbol key**. The goal isn't to memoize the hash code of all strings,= but to **only selectively precompute the hash code of literal strings in the compiler**. The compiler could even selectively do this when we lite= ral string is used to lookup a hash (`opt_aref`). Here's the benchmark we used: ```ruby hash =3D 10.times.to_h do |i| [i, i] end dyn_sym =3D "dynamic_symbol".to_sym hash[:some_symbol] =3D 1 hash[dyn_sym] =3D 1 hash["small"] =3D 2 hash["frozen_string_literal"] =3D 2 Benchmark.ips do |x| x.report("symbol") { hash[:some_symbol] } x.report("dyn_symbol") { hash[:some_symbol] } x.report("small_lit") { hash["small"] } x.report("frozen_lit") { hash["frozen_string_literal"] } x.compare!(order: :baseline) end ``` On Ruby 3.3.0, looking up a String key is a bit slower based on the key siz= e: ``` Calculating ------------------------------------- symbol 24.175M (=B1 1.7%) i/s - 122.002M in 5.048306s dyn_symbol 24.345M (=B1 1.6%) i/s - 122.019M in 5.013400s small_lit 21.252M (=B1 2.1%) i/s - 107.744M in 5.072042s frozen_lit 20.095M (=B1 1.3%) i/s - 100.489M in 5.001681s Comparison: symbol: 24174848.1 i/s dyn_symbol: 24345476.9 i/s - same-ish: difference falls within er= ror small_lit: 21252403.2 i/s - 1.14x slower frozen_lit: 20094766.0 i/s - 1.20x slower ``` With the proof of concept performance is pretty much identical: ``` Calculating ------------------------------------- symbol 23.528M (=B1 6.9%) i/s - 117.584M in 5.033231s dyn_symbol 23.777M (=B1 4.7%) i/s - 120.231M in 5.071734s small_lit 23.066M (=B1 2.9%) i/s - 115.376M in 5.006947s frozen_lit 22.729M (=B1 1.1%) i/s - 115.693M in 5.090700s Comparison: symbol: 23527823.6 i/s dyn_symbol: 23776757.8 i/s - same-ish: difference falls within er= ror small_lit: 23065535.3 i/s - same-ish: difference falls within er= ror frozen_lit: 22729351.6 i/s - same-ish: difference falls within er= ror ``` ### Possible implementation The reason I'm opening this issue early is to get feedback on which would b= e the best implementation. #### Store hashcode after the string terminator Right now the proof of concept simply stores the `st_index_t` after the str= ing null terminator, and only when the string is embedded and as enough lef= t over space. Strings with a precomputed hash are marked with an user flag. Pros: - Very simple implementation, no need to change a lot of code, and very e= asy to strip out if we want to. - Doesn't use any extra memory. If the string doesn't have enough left ov= er bytes, the optimization simply isn't applied. - The worst case overhead is a single `FL_TEST_RAW` in `rb_str_hash`. Cons: - The optimization won't apply to certain string sizes. e.g. strings betw= een `17` and `23` bytes won't have a precomputed hash code. - Extracting the hash code requires some not so nice pointer arithmetic. #### Create another RString union Another possibility would be to add another entry in the `RString` struct u= nion, such as we'd have: ```c struct RString { struct RBasic basic; long len; union { // ... existing members struct { st_index_t hash; char ary[1]; } embded_literal; } as; }; ``` Pros: - The optimization can now be applied to all string sizes. - The hashcode is always at the same offset and properly aligned. Cons: - Some strings would be bumped by one slot size, so would use marginally = more memory. - Complexify the code base more, need to modify a lot more string related= code (e.g. `RSTRING_PTR` and many others) - When compiling such string, if an equal string already exists in the `f= string` table, we'd need to replace it, we can't just mutate it in place to= add the hashcode. ### Prior art [Feature #15331] is somewhat similar in its idea, but it does it lazily for= all strings. Here it's much simpler because limited to string literals, wh= ich are the ones likely to be used as Hash keys, and the overhead is on com= pilation, not runtime (aside from a single flag check). So I think most of = the caveats of that original implementation don't apply here. --=20 https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c= ore.ml.ruby-lang.org/