[ruby-core:63964] [CommonRuby - Feature #10085] [Open] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize

ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed

* [ruby-core:63964] [CommonRuby - Feature #10085] [Open] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
@ 2014-07-23 11:04 ` duerst
  2014-07-23 11:06 ` [ruby-core:63966] [CommonRuby - Feature #10085] " duerst
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: duerst @ 2014-07-23 11:04 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been reported by Martin Dürst.

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: 
* Category: 
* Target version: 
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:63966] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
  2014-07-23 11:04 ` [ruby-core:63964] [CommonRuby - Feature #10085] [Open] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize duerst
@ 2014-07-23 11:06 ` duerst
  2014-07-23 11:06 ` [ruby-core:63969] " duerst
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: duerst @ 2014-07-23 11:06 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by Martin Dürst.

Related to Bug #3376: russian support added

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-47984

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: 
* Category: 
* Target version: 
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:63969] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
  2014-07-23 11:04 ` [ruby-core:63964] [CommonRuby - Feature #10085] [Open] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize duerst
  2014-07-23 11:06 ` [ruby-core:63966] [CommonRuby - Feature #10085] " duerst
@ 2014-07-23 11:06 ` duerst
  2014-07-23 11:07 ` [ruby-core:63968] " duerst
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: duerst @ 2014-07-23 11:06 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by Martin Dürst.

Related to Feature #2034: Consider the ICU Library for Improving and Expanding Unicode Support added

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-47986

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: 
* Category: 
* Target version: 
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:63968] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2014-07-23 11:06 ` [ruby-core:63969] " duerst
@ 2014-07-23 11:07 ` duerst
  2014-07-23 11:10 ` [ruby-core:63971] " duerst
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: duerst @ 2014-07-23 11:07 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by Martin Dürst.

Related to Feature #10002: String swapcase added

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-47988

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: 
* Category: 
* Target version: 
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:63971] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2014-07-23 11:07 ` [ruby-core:63968] " duerst
@ 2014-07-23 11:10 ` duerst
  2014-07-26  8:47 ` [ruby-core:64046] " matz
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: duerst @ 2014-07-23 11:10 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by Martin Dürst.

File CaseConversion.pdf added

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-47990

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: 
* Category: 
* Target version: 
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

---Files--------------------------------
CaseConversion.pdf (340 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:64046] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2014-07-23 11:10 ` [ruby-core:63971] " duerst
@ 2014-07-26  8:47 ` matz
  2014-08-20  1:39 ` [ruby-core:64467] " matz
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: matz @ 2014-07-26  8:47 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by Yukihiro Matsumoto.

Assignee set to Martin Dürst

I want default case conversion should be Unicode aware (when encoding is Unicode).
The previous behavior can be done by `str.downcase(:ascii)`.

Non unicode encoding (e.g. Latin-1) can support non ASCII case conversion, but not mandatory.

Matz.

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-48058

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: Martin Dürst
* Category: 
* Target version: 
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

---Files--------------------------------
CaseConversion.pdf (340 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:64467] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2014-07-26  8:47 ` [ruby-core:64046] " matz
@ 2014-08-20  1:39 ` matz
  2014-09-20  5:44 ` [ruby-core:65153] " duerst
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: matz @ 2014-08-20  1:39 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by Yukihiro Matsumoto.

Related to Feature #10152: String#strip doesn't remove non-breaking space added

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-48413

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: Martin Dürst
* Category: 
* Target version: 
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

---Files--------------------------------
CaseConversion.pdf (340 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:65153] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2014-08-20  1:39 ` [ruby-core:64467] " matz
@ 2014-09-20  5:44 ` duerst
  2014-12-31  6:26 ` [ruby-core:67246] " duerst
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: duerst @ 2014-09-20  5:44 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by Martin Dürst.

Target version set to Ruby 2.3.0

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-48999

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: Martin Dürst
* Category: 
* Target version: Ruby 2.3.0
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

---Files--------------------------------
CaseConversion.pdf (340 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:67246] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2014-09-20  5:44 ` [ruby-core:65153] " duerst
@ 2014-12-31  6:26 ` duerst
  2014-12-31  9:19 ` [ruby-core:67254] " akr
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: duerst @ 2014-12-31  6:26 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by Martin Dürst.

Related to Bug #10550: Resolv::DNS.getaddresses returns no IPs when nameserver returns in differing case than query added

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-50716

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: Martin Dürst
* Category: 
* Target version: Ruby 2.3.0
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

---Files--------------------------------
CaseConversion.pdf (340 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:67254] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2014-12-31  6:26 ` [ruby-core:67246] " duerst
@ 2014-12-31  9:19 ` akr
  2015-07-09  5:12 ` [ruby-core:69915] " 2851820660
  2017-10-15 23:21 ` [ruby-core:83302] [CommonRuby Feature#10085][Closed] " duerst
  11 siblings, 0 replies; 12+ messages in thread
From: akr @ 2014-12-31  9:19 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by Akira Tanaka.

The related issue, [Bug #10550] Resolv::DNS.getaddresses, needs ASCII-only case conversion.
Unicode aware case conversion is not suitable for the issue.
See RFC 4343.

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-50724

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: Martin Dürst
* Category: 
* Target version: Ruby 2.3.0
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

---Files--------------------------------
CaseConversion.pdf (340 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:69915] [CommonRuby - Feature #10085] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
                   ` (9 preceding siblings ...)
  2014-12-31  9:19 ` [ruby-core:67254] " akr
@ 2015-07-09  5:12 ` 2851820660
  2017-10-15 23:21 ` [ruby-core:83302] [CommonRuby Feature#10085][Closed] " duerst
  11 siblings, 0 replies; 12+ messages in thread
From: 2851820660 @ 2015-07-09  5:12 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by 11 22.

http://www.software-rating.com/
http://www.smartlogi.com/  
http://www.shareorder.com/  
http://www.gzs168.com/  
http://www.aimooimage.com/    
http://www.chinatowngate.net/

http://www.inspiredhypnosis.co.uk/daocplat.html
http://the303plan.com/tibiagoldforsale.html

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-53334

* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee: Martin Dürst
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

---Files--------------------------------
CaseConversion.pdf (340 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:83302] [CommonRuby Feature#10085][Closed] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
       [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
                   ` (10 preceding siblings ...)
  2015-07-09  5:12 ` [ruby-core:69915] " 2851820660
@ 2017-10-15 23:21 ` duerst
  11 siblings, 0 replies; 12+ messages in thread
From: duerst @ 2017-10-15 23:21 UTC (permalink / raw)
  To: ruby-core

Issue #10085 has been updated by duerst (Martin Dürst).

Status changed from Open to Closed

Close way overdue, should have happened somewhere around r55281.

----------------------------------------
Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize
https://bugs.ruby-lang.org/issues/10085#change-67261

* Author: duerst (Martin Dürst)
* Status: Closed
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* Target version: 
----------------------------------------
Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase    # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en'  # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr'  # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

---Files--------------------------------
CaseConversion.pdf (340 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-15 23:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-10085.20140723110448@ruby-lang.org>
2014-07-23 11:04 ` [ruby-core:63964] [CommonRuby - Feature #10085] [Open] Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize duerst
2014-07-23 11:06 ` [ruby-core:63966] [CommonRuby - Feature #10085] " duerst
2014-07-23 11:06 ` [ruby-core:63969] " duerst
2014-07-23 11:07 ` [ruby-core:63968] " duerst
2014-07-23 11:10 ` [ruby-core:63971] " duerst
2014-07-26  8:47 ` [ruby-core:64046] " matz
2014-08-20  1:39 ` [ruby-core:64467] " matz
2014-09-20  5:44 ` [ruby-core:65153] " duerst
2014-12-31  6:26 ` [ruby-core:67246] " duerst
2014-12-31  9:19 ` [ruby-core:67254] " akr
2015-07-09  5:12 ` [ruby-core:69915] " 2851820660
2017-10-15 23:21 ` [ruby-core:83302] [CommonRuby Feature#10085][Closed] " duerst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).