From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: poffice@blade.nagaokaut.ac.jp Delivered-To: poffice@blade.nagaokaut.ac.jp Received: from kankan.nagaokaut.ac.jp (kankan.nagaokaut.ac.jp [133.44.2.24]) by blade.nagaokaut.ac.jp (Postfix) with ESMTP id A66D317D98BC for ; Sat, 26 Jul 2014 17:21:50 +0900 (JST) Received: from funfun.nagaokaut.ac.jp (funfun.nagaokaut.ac.jp [133.44.2.201]) by kankan.nagaokaut.ac.jp (Postfix) with ESMTP id B53E8B5D866 for ; Sat, 26 Jul 2014 17:48:58 +0900 (JST) Received: from funfun.nagaokaut.ac.jp (localhost.nagaokaut.ac.jp [127.0.0.1]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id DDD1A97A83B for ; Sat, 26 Jul 2014 17:49:00 +0900 (JST) X-Virus-Scanned: amavisd-new at nagaokaut.ac.jp Authentication-Results: funfun.nagaokaut.ac.jp (amavisd-new); dkim=fail (1024-bit key) reason="fail (message has been altered)" header.d=sendgrid.me Received: from funfun.nagaokaut.ac.jp ([127.0.0.1]) by funfun.nagaokaut.ac.jp (funfun.nagaokaut.ac.jp [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cry5gXNIIsTD for ; Sat, 26 Jul 2014 17:49:00 +0900 (JST) Received: from voscc.nagaokaut.ac.jp (voscc.nagaokaut.ac.jp [133.44.1.100]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id 9039897A834 for ; Sat, 26 Jul 2014 17:49:00 +0900 (JST) Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by voscc.nagaokaut.ac.jp (Postfix) with ESMTP id B417A95243E for ; Sat, 26 Jul 2014 17:48:57 +0900 (JST) Received: from [221.186.184.76] (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id B19D91204CB; Sat, 26 Jul 2014 17:48:50 +0900 (JST) X-Original-To: ruby-core@ruby-lang.org Delivered-To: ruby-core@ruby-lang.org Received: from o10.shared.sendgrid.net (o10.shared.sendgrid.net [173.193.132.135]) by neon.ruby-lang.org (Postfix) with ESMTPS id 6C1DF12048A for ; Sat, 26 Jul 2014 17:48:47 +0900 (JST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sendgrid.me; h=from:to:references:subject:mime-version:content-type:content-transfer-encoding:list-id; s=smtpapi; bh=wS+O9cX437xcv4SWlslasfSNb2Q=; b=RahhuXok+7SFItZAJO XoVHQ2ET6EHAfKJWhswfdwK+MPoBfZBOoqAsGMuUpJZuJ6gptXBE1OYucPSj/mP4 dOJ0PCJbiSLemXuzNV+dyWbaPXMaP+rJDL0wR2M3kLliQEkfndjaUkQc7/HcPV16 SIWyg4mpXNKkSDe8HfOfwROwM= Received: by mf151.sendgrid.net with SMTP id mf151.16147.53D36B6DE 2014-07-26 08:48:45.437260059 +0000 UTC Received: from herokuapp.com (ec2-54-198-152-32.compute-1.amazonaws.com [54.198.152.32]) by ismtpd-003.iad1.sendgrid.net (SG) with ESMTP id 14771dae1da.115f.484695 Sat, 26 Jul 2014 08:47:55 +0000 (GMT) Date: Sat, 26 Jul 2014 08:47:55 +0000 From: matz@ruby-lang.org To: ruby-core@ruby-lang.org Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Redmine-MailingListIntegration-Message-Ids: 38202 X-Redmine-Project: common-ruby X-Redmine-Issue-Id: 10085 X-Redmine-Issue-Author: duerst X-Redmine-Issue-Assignee: duerst X-Redmine-Sender: matz X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: OOF Auto-Submitted: auto-generated X-SG-EID: ync6xU2WACa70kv/Ymy4QrNMhiuLXJG8OTL2vJD1yS4L65ADt9bt+I31a/U/4HyczVvhBzyXMethT8mLeNAwnFCT6cv9RjQtcR3vSPpTVAdB62iPvy+q44heUfMTO1GrlVHwv8X+Ex2tB4rqqtOb68O8hUZJS2cAgAOiW8jZngE= X-ML-Name: ruby-core X-Mail-Count: 64046 Subject: [ruby-core:64046] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" Issue #10085 has been updated by Yukihiro Matsumoto. Assignee set to Martin D=C3=BCrst I want default case conversion should be Unicode aware (when encoding is Un= icode). The previous behavior can be done by `str.downcase(:ascii)`. Non unicode encoding (e.g. Latin-1) can support non ASCII case conversion, = but not mandatory. Matz. ---------------------------------------- Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swa= pcase/capitalize https://bugs.ruby-lang.org/issues/10085#change-48058 * Author: Martin D=C3=BCrst * Status: Open * Priority: Normal * Assignee: Martin D=C3=BCrst * Category:=20 * Target version:=20 ---------------------------------------- Case conversion functions are currently limited to ASCII characters. When u= sed with formal languages, that may be appropriate, but it is often not app= ropriate for applications. In order to avoid backwards-compatibility problems and to make sure that th= e various variants of case conversion (e.g. language-dependent) can be sele= cted, we propose to add an optional parameter to the case conversion functi= ons. Our current design idea is as follows: ASCII-only if no parameter: 'T=C3=BCrkiye'.upcase # =3D> 'T=C3=BCRKIYE', note lower-case =C3=BC Parameter triggers (general) Unicode conversion: 'T=C3=BCrkiye'.upcase 'en' # =3D> 'T=C3=9CRKIYE', note upper-case =C3=9C The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) langu= age tag. This means that for languages with special case conversion rules, such as T= urkish, this works as follows: 'T=C3=BCrkiye'.upcase 'tr' # =3D> 'T=C3=9CRK=C4=B0YE', note upper-case =C4= =B0 (with dot!) In the second example, we used 'en', but most other languages would work, t= oo, because a single case conversion works for most languages. Turkic langu= ages are the biggest exception. The Unicode standard also defines various cases of "case-folding", which us= ually is lossy, e.g. mapping German =C3=9F to ss and so on. It should be possible to include this functionality in this p= roposal, e.g. by using :symbols or CONSTANTs for the few specific foldings.= It may also be possible to define a reversible variant of case conversion = in particular for use with swapcase. In the long term, instead of a direct BCP 47 string, we could create a Loca= le class that would incorporate language-specific facilities, but this may = need more detailed considerations. The idea of using an additional parameter to indicate language-dependent or= other processing variants should be extensible to areas such as number-to-= string conversion and date formation. While this proposal is only about cas= e conversion, we should check that there is a good chance to use similar pa= rameter conventions for such extensions. [This proposal is based on research done together with my student Kimihito = Matsui.] ---Files-------------------------------- CaseConversion.pdf (340 KB) --=20 https://bugs.ruby-lang.org/