ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: duerst@it.aoyama.ac.jp
To: ruby-core@ruby-lang.org
Subject: [ruby-core:83302] [CommonRuby Feature#10085][Closed] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
Date: Sun, 15 Oct 2017 23:21:42 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-67261.20171015232141.6be447a407ff9cce@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-10085.20140723110448@ruby-lang.org

Issue #10085 has been updated by duerst (Martin Dürst).

Status changed from Open to Closed

Close way overdue, should have happened somewhere around r55281.

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-67261

* Author: duerst (Martin Dürst)
* Status: Closed
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* Target version: 
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]


---Files--------------------------------
CaseConversion.pdf (340 KB)


-- 
https://bugs.ruby-lang.org/

      parent reply	other threads:[~2017-10-15 23:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
2014-07-23 11:04 ` [ruby-core:63964] [CommonRuby - Feature #10085] [Open] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize duerst
2014-07-23 11:06 ` [ruby-core:63966] [CommonRuby - Feature #10085] " duerst
2014-07-23 11:06 ` [ruby-core:63969] " duerst
2014-07-23 11:07 ` [ruby-core:63968] " duerst
2014-07-23 11:10 ` [ruby-core:63971] " duerst
2014-07-26  8:47 ` [ruby-core:64046] " matz
2014-08-20  1:39 ` [ruby-core:64467] " matz
2014-09-20  5:44 ` [ruby-core:65153] " duerst
2014-12-31  6:26 ` [ruby-core:67246] " duerst
2014-12-31  9:19 ` [ruby-core:67254] " akr
2015-07-09  5:12 ` [ruby-core:69915] " 2851820660
2017-10-15 23:21 ` duerst [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-67261.20171015232141.6be447a407ff9cce@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).