From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-2.9 required=3.0 tests=AWL,BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_BL_SPAMCOP_NET,SPF_HELO_PASS, SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [IPv6:2a01:4f8:1c0c:6b10::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id CF28E1F910 for ; Sat, 26 Nov 2022 13:06:22 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.b="gC5EDRGK"; dkim-atps=neutral Received: from nue.mailmanlists.eu (localhost [127.0.0.1]) by nue.mailmanlists.eu (Postfix) with ESMTP id 9A5AC7E678; Sat, 26 Nov 2022 13:06:08 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=gC5EDRGK; dkim-atps=neutral Received: from xtrwkhkc.outbound-mail.sendgrid.net (xtrwkhkc.outbound-mail.sendgrid.net [167.89.16.28]) by nue.mailmanlists.eu (Postfix) with ESMTPS id 46E0D7E66F for ; Sat, 26 Nov 2022 13:06:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc; s=s1; bh=HLo77/MIq8sMKq0aC5vNSKJC+1l0zJjnRrzMPGAOeVI=; b=gC5EDRGKLTpZwh16Wm+si6rGjnBZR/EhDRbsxDsdv+NUNYGr7GeJi6lgI9jauBpC/qXr mdO6HKeNrQRXsUxpA7nspCMpypVmHVFHTcuiz8JTDEwJY521jUWFReDYKD8AYaZoe8XC7W Io0cId9Fkn851zNfC9vp2baC3cwpRhawVeFttVUxDsFZokc87hGHpBV3vGHQQRmWlJDr59 ye3pmBuZLmrZZxgfvxjJQjD4rBZA3giIrj3aaxVA3RCabAijueSxnyPO/KCBWVPealtyw9 YKB1GBCbNzrtPLuD/+BYllf1roCnZyRgw+XpY+z1PPof+QAG0CuRi9tuw1WseqqQ== Received: by filterdrecv-69c5db5cf4-7l2sn with SMTP id filterdrecv-69c5db5cf4-7l2sn-1-63820F3A-1D 2022-11-26 13:06:02.72968486 +0000 UTC m=+741474.191430281 Received: from herokuapp.com (unknown) by geopod-ismtpd-6-6 (SG) with ESMTP id RtHixvoJS9mUxh7dpQtB6A for ; Sat, 26 Nov 2022 13:06:02.584 +0000 (UTC) Date: Sat, 26 Nov 2022 13:06:02 +0000 (UTC) From: "Eregon (Benoit Daloze)" Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Feature X-Redmine-Issue-Id: 19090 X-Redmine-Issue-Author: k0kubun X-Redmine-Sender: Eregon X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 87367 X-SG-EID: =?us-ascii?Q?DvL3W2Xo+Vk=2FeUn3F50L=2FNc8u9NqZCnbE0mXZHiyye4R1YZg=2FtAFc0SAFzpcS=2F?= =?us-ascii?Q?Zcr7BqPN=2FYCcMZBMITNeIci9STxYay0JAJWPbek?= =?us-ascii?Q?rFR+qVAy8z3IKD6URnyuCUF8AUsVBn5V+V0aUDV?= =?us-ascii?Q?15b6ieRwiLuBWNByCT+RKlgRHC39FRC4qxRj4yn?= =?us-ascii?Q?O4ioDRC8Qd=2FAL7kZ6UzmmSZ=2FERpOyQfoISAwHYP?= =?us-ascii?Q?Tn2+Thep3g6F8gav+CkLdD2LOuP8Yql3Lj3BLQe?= =?us-ascii?Q?62SD3uzynEyjrv=2Fi7nrOQ=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: b/2+PoftWZ6GuOu3b0IycA== Message-ID-Hash: KRA3HADTULPYER2YEGBLGPNCLYXMZ3FO X-Message-ID-Hash: KRA3HADTULPYER2YEGBLGPNCLYXMZ3FO X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:111021] [Ruby master Feature#19090] Do not duplicate an unescaped string in CGI.escapeHTML List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Issue #19090 has been updated by Eregon (Benoit Daloze). I did not expect `rb_str_dup()` is so costly on CRuby, I guess the allocation is slow and of course CRuby can't escape-analyze it. I think it is important to keep the optimized HTML escape in core/stdlib (e.g., in CGI), as the fastest way is implementation-dependent. See https://bugs.ruby-lang.org/issues/19102#note-4 I think we should add an optional argument to avoid copying the string when unchanged, that's easy and avoids yet another place/method with that escaping logic. ---------------------------------------- Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML https://bugs.ruby-lang.org/issues/19090#change-100276 * Author: k0kubun (Takashi Kokubun) * Status: Closed * Priority: Normal ---------------------------------------- ## Proposal Stop guaranteeing that `GGI.escapeHTML` returns a new string even if there's nothing to be escaped. More specifically, stop calling this `rb_str_dup` https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped. ## Background My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why `rb_str_dup` was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original `gsub`-based implementation always returns a new object. As a result, even while many people use `CGI.escapeHTML` as an optimized implementation for escaping HTML today, it ended up having a compromised performance. ## Motivation The motivation is to improve performance. By just doing so, escaping a pre-allocated `"string"` becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f. The most major use case of `CGP.escapeHTML` is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of `'"&<>` characters. So we should stop wasting that to optimize that case. [Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with `gem-codesearch` today. The only reason to maintain the current behavior would be to allow using a return value of `CGI.escapeHTML` as a buffer for creating another longer string starting with the escaped value, but using `CGI.escapeHTML` to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal. ## Why not an optional flag like `CGI.escapeHTML(str, dup: false)`? Two reasons: * The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using `CGI.escapeHTML` less readable just for maintaining a use case that doesn't exist. * Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use `Primitive` to address that, but this is a default gem and we can't use that. * We could workaround that if we choose `CGI.escapeHTML(str, false)`, but again it'd spoil the readability for maintaining an invalid use case. ## Why not a new method? It's a good idea actually, but with `escapeHTML`, `escape_html`, and `h` aliased to it already, I can't think of a good name for it. And again, not calling it `escapeHTML` or `escape_html` would spoil the readability for no valid reason. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/