From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: poffice@blade.nagaokaut.ac.jp Delivered-To: poffice@blade.nagaokaut.ac.jp Received: from kankan.nagaokaut.ac.jp (kankan.nagaokaut.ac.jp [133.44.2.24]) by blade.nagaokaut.ac.jp (Postfix) with ESMTP id CD2DF17DBA3B for ; Sat, 20 Sep 2014 15:04:01 +0900 (JST) Received: from funfun.nagaokaut.ac.jp (funfun.nagaokaut.ac.jp [133.44.2.201]) by kankan.nagaokaut.ac.jp (Postfix) with ESMTP id 03D4CB5D833 for ; Sat, 20 Sep 2014 14:45:03 +0900 (JST) Received: from funfun.nagaokaut.ac.jp (localhost.nagaokaut.ac.jp [127.0.0.1]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id B687797A82B for ; Sat, 20 Sep 2014 14:45:04 +0900 (JST) X-Virus-Scanned: amavisd-new at nagaokaut.ac.jp Authentication-Results: funfun.nagaokaut.ac.jp (amavisd-new); dkim=fail (1024-bit key) reason="fail (message has been altered)" header.d=sendgrid.me Received: from funfun.nagaokaut.ac.jp ([127.0.0.1]) by funfun.nagaokaut.ac.jp (funfun.nagaokaut.ac.jp [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XG0nIj0-1tmz for ; Sat, 20 Sep 2014 14:45:04 +0900 (JST) Received: from voscc.nagaokaut.ac.jp (voscc.nagaokaut.ac.jp [133.44.1.100]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id 9398D97A820 for ; Sat, 20 Sep 2014 14:45:04 +0900 (JST) Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by voscc.nagaokaut.ac.jp (Postfix) with ESMTP id 0A0C895243A for ; Sat, 20 Sep 2014 14:45:01 +0900 (JST) Received: from [221.186.184.76] (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id 2D185120491; Sat, 20 Sep 2014 14:44:55 +0900 (JST) X-Original-To: ruby-core@ruby-lang.org Delivered-To: ruby-core@ruby-lang.org Received: from o2.heroku.sendgrid.net (o2.heroku.sendgrid.net [67.228.50.55]) by neon.ruby-lang.org (Postfix) with ESMTPS id 0FFBD120476 for ; Sat, 20 Sep 2014 14:44:50 +0900 (JST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sendgrid.me; h=from:to:references:subject:mime-version:content-type:content-transfer-encoding:list-id; s=smtpapi; bh=GbquJqwe8hwnQhqVm9Xh371q3uw=; b=yJHN5U3r+mFAm2uXkc VygEdpSpsb8PkmG8I+42KHgPxovz9xNOjISNdkN5w2BE1cLq5puQtWAGyxhWJzme yzcuVXQ2/QUbYV5J6sKE0c87sS7WOjI+mZhECVCtgq0Xgrmy5isHUs4px05VVMQ3 EBLLWeTbW2PJagAjEjI9hmPZ0= Received: by mf275.sendgrid.net with SMTP id mf275.24495.541D144E14 2014-09-20 05:44:48.98917153 +0000 UTC Received: from herokuapp.com (ec2-54-89-12-232.compute-1.amazonaws.com [54.89.12.232]) by ismtpd-025.iad1.sendgrid.net (SG) with ESMTP id 14891975bb8.66c2.11dd142 Sat, 20 Sep 2014 05:44:48 +0000 (GMT) Date: Sat, 20 Sep 2014 05:44:48 +0000 From: duerst@it.aoyama.ac.jp To: ruby-core@ruby-lang.org Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Redmine-MailingListIntegration-Message-Ids: 39404 X-Redmine-Project: common-ruby X-Redmine-Issue-Id: 10085 X-Redmine-Issue-Author: duerst X-Redmine-Issue-Assignee: duerst X-Redmine-Sender: duerst X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: OOF Auto-Submitted: auto-generated X-SG-EID: ync6xU2WACa70kv/Ymy4QrNMhiuLXJG8OTL2vJD1yS52sSlbpdMQCQ3enmRiHaVW/cn/M7iN2Epa1oCO5MbK9EeQzRaT5N1mnm4NX0vElMzc/XGB9dtFB706pCQFIqbFGh6g6zL5wIct+ts3qH2EP2hX3T4qEtE3T2i/dlxRsWA= X-ML-Name: ruby-core X-Mail-Count: 65153 Subject: [ruby-core:65153] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" Issue #10085 has been updated by Martin D=C3=BCrst. Target version set to Ruby 2.3.0 ---------------------------------------- Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swa= pcase/capitalize https://bugs.ruby-lang.org/issues/10085#change-48999 * Author: Martin D=C3=BCrst * Status: Open * Priority: Normal * Assignee: Martin D=C3=BCrst * Category:=20 * Target version: Ruby 2.3.0 ---------------------------------------- Case conversion functions are currently limited to ASCII characters. When u= sed with formal languages, that may be appropriate, but it is often not app= ropriate for applications. In order to avoid backwards-compatibility problems and to make sure that th= e various variants of case conversion (e.g. language-dependent) can be sele= cted, we propose to add an optional parameter to the case conversion functi= ons. Our current design idea is as follows: ASCII-only if no parameter: 'T=C3=BCrkiye'.upcase # =3D> 'T=C3=BCRKIYE', note lower-case =C3=BC Parameter triggers (general) Unicode conversion: 'T=C3=BCrkiye'.upcase 'en' # =3D> 'T=C3=9CRKIYE', note upper-case =C3=9C The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) langu= age tag. This means that for languages with special case conversion rules, such as T= urkish, this works as follows: 'T=C3=BCrkiye'.upcase 'tr' # =3D> 'T=C3=9CRK=C4=B0YE', note upper-case =C4= =B0 (with dot!) In the second example, we used 'en', but most other languages would work, t= oo, because a single case conversion works for most languages. Turkic langu= ages are the biggest exception. The Unicode standard also defines various cases of "case-folding", which us= ually is lossy, e.g. mapping German =C3=9F to ss and so on. It should be possible to include this functionality in this p= roposal, e.g. by using :symbols or CONSTANTs for the few specific foldings.= It may also be possible to define a reversible variant of case conversion = in particular for use with swapcase. In the long term, instead of a direct BCP 47 string, we could create a Loca= le class that would incorporate language-specific facilities, but this may = need more detailed considerations. The idea of using an additional parameter to indicate language-dependent or= other processing variants should be extensible to areas such as number-to-= string conversion and date formation. While this proposal is only about cas= e conversion, we should check that there is a good chance to use similar pa= rameter conventions for such extensions. [This proposal is based on research done together with my student Kimihito = Matsui.] ---Files-------------------------------- CaseConversion.pdf (340 KB) --=20 https://bugs.ruby-lang.org/