From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: poffice@blade.nagaokaut.ac.jp Delivered-To: poffice@blade.nagaokaut.ac.jp Received: from kankan.nagaokaut.ac.jp (kankan.nagaokaut.ac.jp [133.44.2.24]) by blade.nagaokaut.ac.jp (Postfix) with ESMTP id 08E4A17D97AC for ; Wed, 23 Jul 2014 19:43:56 +0900 (JST) Received: from funfun.nagaokaut.ac.jp (funfun.nagaokaut.ac.jp [133.44.2.201]) by kankan.nagaokaut.ac.jp (Postfix) with ESMTP id 4E4D8B5D840 for ; Wed, 23 Jul 2014 20:10:24 +0900 (JST) Received: from funfun.nagaokaut.ac.jp (localhost.nagaokaut.ac.jp [127.0.0.1]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id BC4BA97A83C for ; Wed, 23 Jul 2014 20:10:26 +0900 (JST) X-Virus-Scanned: amavisd-new at nagaokaut.ac.jp Authentication-Results: funfun.nagaokaut.ac.jp (amavisd-new); dkim=fail (1024-bit key) reason="fail (message has been altered)" header.d=sendgrid.me Received: from funfun.nagaokaut.ac.jp ([127.0.0.1]) by funfun.nagaokaut.ac.jp (funfun.nagaokaut.ac.jp [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6pLb7nInJG98 for ; Wed, 23 Jul 2014 20:10:26 +0900 (JST) Received: from voscc.nagaokaut.ac.jp (voscc.nagaokaut.ac.jp [133.44.1.100]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id 99AA997A83B for ; Wed, 23 Jul 2014 20:10:26 +0900 (JST) Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by voscc.nagaokaut.ac.jp (Postfix) with ESMTP id 2B602952439 for ; Wed, 23 Jul 2014 20:10:20 +0900 (JST) Received: from [221.186.184.76] (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id E8411120497; Wed, 23 Jul 2014 20:10:16 +0900 (JST) X-Original-To: ruby-core@ruby-lang.org Delivered-To: ruby-core@ruby-lang.org Received: from o10.shared.sendgrid.net (o10.shared.sendgrid.net [173.193.132.135]) by neon.ruby-lang.org (Postfix) with ESMTPS id A5DD612046E for ; Wed, 23 Jul 2014 20:10:14 +0900 (JST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sendgrid.me; h=from:to:references:subject:mime-version:content-type:content-transfer-encoding:list-id; s=smtpapi; bh=J3dWXpnqjDefAvvQz7pfb85my2E=; b=UO5o3WQ9rjp3xsJivo GkLkykNqQ72cZJCSHPmV3Tkh48yhNt7ZrqqaC05cN5knNP5CewPqUwYCXekV7Vov pRz57Jct043Sqj2dMrY651x8XrEHbqPr6cKZFaz8CjlWFgbIe1a1u7R8K/h/B0Qt F5Cru1UMlM1IeaBmtssuEqaMs= Received: by mf181.sendgrid.net with SMTP id mf181.22086.53CF9814E 2014-07-23 11:10:12.249618668 +0000 UTC Received: from herokuapp.com (ec2-54-82-4-72.compute-1.amazonaws.com [54.82.4.72]) by ismtpd-003.iad1.sendgrid.net (SG) with ESMTP id 14762e9fa68.1166.11e7c5 for ; Wed, 23 Jul 2014 11:10:06 +0000 (GMT) Date: Wed, 23 Jul 2014 11:10:06 +0000 From: duerst@it.aoyama.ac.jp To: ruby-core@ruby-lang.org Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Redmine-MailingListIntegration-Message-Ids: 38121 X-Redmine-Project: common-ruby X-Redmine-Issue-Id: 10085 X-Redmine-Issue-Author: duerst X-Redmine-Sender: duerst X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: OOF Auto-Submitted: auto-generated X-SG-EID: ync6xU2WACa70kv/Ymy4QrNMhiuLXJG8OTL2vJD1yS5rls3jG6cJrMWM+C/VyUXmaX4yXMsCWTxl567RRL+8hKcnJUlXnYnTlsDjlRoMqT0VmlbepziCjQs2/ImtnkbyP7sPbKmv/l6tR6o8F1SCKLuEOoxFd61Y4lcoiSP4Plo= X-ML-Name: ruby-core X-Mail-Count: 63971 Subject: [ruby-core:63971] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" Issue #10085 has been updated by Martin D=C3=BCrst. File CaseConversion.pdf added ---------------------------------------- Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swa= pcase/capitalize https://bugs.ruby-lang.org/issues/10085#change-47990 * Author: Martin D=C3=BCrst * Status: Open * Priority: Normal * Assignee:=20 * Category:=20 * Target version:=20 ---------------------------------------- Case conversion functions are currently limited to ASCII characters. When u= sed with formal languages, that may be appropriate, but it is often not app= ropriate for applications. In order to avoid backwards-compatibility problems and to make sure that th= e various variants of case conversion (e.g. language-dependent) can be sele= cted, we propose to add an optional parameter to the case conversion functi= ons. Our current design idea is as follows: ASCII-only if no parameter: 'T=C3=BCrkiye'.upcase # =3D> 'T=C3=BCRKIYE', note lower-case =C3=BC Parameter triggers (general) Unicode conversion: 'T=C3=BCrkiye'.upcase 'en' # =3D> 'T=C3=9CRKIYE', note upper-case =C3=9C The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) langu= age tag. This means that for languages with special case conversion rules, such as T= urkish, this works as follows: 'T=C3=BCrkiye'.upcase 'tr' # =3D> 'T=C3=9CRK=C4=B0YE', note upper-case =C4= =B0 (with dot!) In the second example, we used 'en', but most other languages would work, t= oo, because a single case conversion works for most languages. Turkic langu= ages are the biggest exception. The Unicode standard also defines various cases of "case-folding", which us= ually is lossy, e.g. mapping German =C3=9F to ss and so on. It should be possible to include this functionality in this p= roposal, e.g. by using :symbols or CONSTANTs for the few specific foldings.= It may also be possible to define a reversible variant of case conversion = in particular for use with swapcase. In the long term, instead of a direct BCP 47 string, we could create a Loca= le class that would incorporate language-specific facilities, but this may = need more detailed considerations. The idea of using an additional parameter to indicate language-dependent or= other processing variants should be extensible to areas such as number-to-= string conversion and date formation. While this proposal is only about cas= e conversion, we should check that there is a good chance to use similar pa= rameter conventions for such extensions. [This proposal is based on research done together with my student Kimihito = Matsui.] ---Files-------------------------------- CaseConversion.pdf (340 KB) --=20 https://bugs.ruby-lang.org/