ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:25360] [Bug #2043] incompatible character encodings
@ 2009-09-04 14:08 Vit Ondruch
  2009-09-04 17:12 ` [ruby-core:25369] [Bug #2043](Assigned) " Yui NARUSE
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Vit Ondruch @ 2009-09-04 14:08 UTC (permalink / raw
  To: ruby-core

Bug #2043: incompatible character encodings
http://redmine.ruby-lang.org/issues/show/2043

Author: Vit Ondruch
Status: Open, Priority: Normal
ruby -v: ruby 1.9.2dev (2009-09-02) [i386-mswin32_90]

Why the following example fails with the "Encoding::CompatibilityError: incompatible character encodings: Windows-1250 and UTF-8" exception?

s = "\u017Elu\u0165ou\u010dk\u00fd"
a = s.encode 'cp1250'
b = s.encode 'utf-8'
c = a + b

I would expect that if the strings are not in the same encoding, that Ruby will do everything they can to satisfy me, but they just tries if there is possible conversion to ASCII otherwise exception is fired. This is really annoying behavior.

Have you considered to allow such string merge?


----------------------------------------
http://redmine.ruby-lang.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:25369] [Bug #2043](Assigned) incompatible character encodings
  2009-09-04 14:08 [ruby-core:25360] [Bug #2043] incompatible character encodings Vit Ondruch
@ 2009-09-04 17:12 ` Yui NARUSE
  2009-09-07  6:12 ` [ruby-core:25452] [Feature #2043] " Vit Ondruch
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Yui NARUSE @ 2009-09-04 17:12 UTC (permalink / raw
  To: ruby-core

Issue #2043 has been updated by Yui NARUSE.

Status changed from Open to Assigned
Assigned to set to Yui NARUSE

Sorry, what is "possible conversion to ASCII" ?
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2043

----------------------------------------
http://redmine.ruby-lang.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:25452] [Feature #2043] incompatible character encodings
  2009-09-04 14:08 [ruby-core:25360] [Bug #2043] incompatible character encodings Vit Ondruch
  2009-09-04 17:12 ` [ruby-core:25369] [Bug #2043](Assigned) " Yui NARUSE
@ 2009-09-07  6:12 ` Vit Ondruch
  2009-09-07  6:22   ` [ruby-core:25453] " Nobuyoshi Nakada
  2009-09-07  6:33 ` [ruby-core:25454] " Yui NARUSE
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Vit Ondruch @ 2009-09-07  6:12 UTC (permalink / raw
  To: ruby-core

Issue #2043 has been updated by Vit Ondruch.


In following example, just characters from US-ASCII are used and in this case the addition works well.

s = 'abc'
a = s.encode 'cp1250'
b = s.encode 'utf-8'
c = a + b
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2043

----------------------------------------
http://redmine.ruby-lang.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:25453] Re: [Feature #2043] incompatible character encodings
  2009-09-07  6:12 ` [ruby-core:25452] [Feature #2043] " Vit Ondruch
@ 2009-09-07  6:22   ` Nobuyoshi Nakada
  0 siblings, 0 replies; 10+ messages in thread
From: Nobuyoshi Nakada @ 2009-09-07  6:22 UTC (permalink / raw
  To: ruby-core

Hi,

At Mon, 7 Sep 2009 15:12:16 +0900,
Vit Ondruch wrote in [ruby-core:25452]:
> In following example, just characters from US-ASCII are used
> and in this case the addition works well.
> 
> s = 'abc'
> a = s.encode 'cp1250'
> b = s.encode 'utf-8'
> c = a + b

Because no conversion is needed.

-- 
Nobu Nakada

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:25454] [Feature #2043] incompatible character encodings
  2009-09-04 14:08 [ruby-core:25360] [Bug #2043] incompatible character encodings Vit Ondruch
  2009-09-04 17:12 ` [ruby-core:25369] [Bug #2043](Assigned) " Yui NARUSE
  2009-09-07  6:12 ` [ruby-core:25452] [Feature #2043] " Vit Ondruch
@ 2009-09-07  6:33 ` Yui NARUSE
  2009-09-07  7:18 ` [ruby-core:25455] " Vit Ondruch
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Yui NARUSE @ 2009-09-07  6:33 UTC (permalink / raw
  To: ruby-core

Issue #2043 has been updated by Yui NARUSE.


Ruby 1.9 doesn't automatic conversion.
ASCII character set is a special
because those characters of ASCII compatible encodings are the same characters.

On Ruby 1.9's view, Unicode is not a superset of Windows-1252.
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2043

----------------------------------------
http://redmine.ruby-lang.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:25455] [Feature #2043] incompatible character encodings
  2009-09-04 14:08 [ruby-core:25360] [Bug #2043] incompatible character encodings Vit Ondruch
                   ` (2 preceding siblings ...)
  2009-09-07  6:33 ` [ruby-core:25454] " Yui NARUSE
@ 2009-09-07  7:18 ` Vit Ondruch
  2009-09-07  7:26   ` [ruby-core:25456] " Vincent Isambart
  2009-09-07  8:04 ` [ruby-core:25457] " Yui NARUSE
  2009-09-07 13:21 ` [ruby-core:25460] " Vit Ondruch
  5 siblings, 1 reply; 10+ messages in thread
From: Vit Ondruch @ 2009-09-07  7:18 UTC (permalink / raw
  To: ruby-core

Issue #2043 has been updated by Vit Ondruch.


> On Ruby 1.9's view, Unicode is not a superset of Windows-1252.

Is the "Ruby 1.9's view" somewhere described in detail? I still have the feeling that it is just half baked :/
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2043

----------------------------------------
http://redmine.ruby-lang.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:25456] Re: [Feature #2043] incompatible character  encodings
  2009-09-07  7:18 ` [ruby-core:25455] " Vit Ondruch
@ 2009-09-07  7:26   ` Vincent Isambart
  0 siblings, 0 replies; 10+ messages in thread
From: Vincent Isambart @ 2009-09-07  7:26 UTC (permalink / raw
  To: ruby-core

> Is the "Ruby 1.9's view" somewhere described in detail? I still have the feeling that it is just half baked :/
You have explanations in James Gray's series of article on 1.9's M17N :
http://blog.grayproductions.net/articles/ruby_19s_string
(for all the articles :
http://blog.grayproductions.net/categories/character_encodings)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:25457] [Feature #2043] incompatible character encodings
  2009-09-04 14:08 [ruby-core:25360] [Bug #2043] incompatible character encodings Vit Ondruch
                   ` (3 preceding siblings ...)
  2009-09-07  7:18 ` [ruby-core:25455] " Vit Ondruch
@ 2009-09-07  8:04 ` Yui NARUSE
  2009-09-07 13:21 ` [ruby-core:25460] " Vit Ondruch
  5 siblings, 0 replies; 10+ messages in thread
From: Yui NARUSE @ 2009-09-07  8:04 UTC (permalink / raw
  To: ruby-core

Issue #2043 has been updated by Yui NARUSE.


http://jp.rubyist.net/magazine/?0025-Ruby19_m17n
http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html
I wrote above but in Japanese, second is its translation.

http://github.com/candlerb/string19/tree/master
James' and string19 is also well documented.


People in ISO 8859 may think why Unicode is not a super set of Windows-1252.
In Japan, because of lack of standard conversion tables
between Japanese legacy encoding (Shift_JIS, EUC-JP, ISO-2022-JP) and Unicode,
vendors use different tables.
This sad situation made that Unicode is not a simple super set of legacy.
Ruby 1.9 inherits this.

If wide consensus for the standard table was made before Ruby 2.0,
Ruby 2.0 may have automatic conversion (or Unicode comes to be the internal code).
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2043

----------------------------------------
http://redmine.ruby-lang.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:25460] [Feature #2043] incompatible character encodings
  2009-09-04 14:08 [ruby-core:25360] [Bug #2043] incompatible character encodings Vit Ondruch
                   ` (4 preceding siblings ...)
  2009-09-07  8:04 ` [ruby-core:25457] " Yui NARUSE
@ 2009-09-07 13:21 ` Vit Ondruch
  2009-09-08  7:26   ` [ruby-core:25472] " "Martin J. Dürst"
  5 siblings, 1 reply; 10+ messages in thread
From: Vit Ondruch @ 2009-09-07 13:21 UTC (permalink / raw
  To: ruby-core

Issue #2043 has been updated by Vit Ondruch.


Thank you for the links. It was interesting.

I'm looking forward Ruby 2.0 and their automatic conversions, since writing c = a.encode('utf-8') + b.encode('utf-8') to safely concatenate two strings is not sexy at all.

Vit
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2043

----------------------------------------
http://redmine.ruby-lang.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:25472] Re: [Feature #2043] incompatible character encodings
  2009-09-07 13:21 ` [ruby-core:25460] " Vit Ondruch
@ 2009-09-08  7:26   ` "Martin J. Dürst"
  0 siblings, 0 replies; 10+ messages in thread
From: "Martin J. Dürst" @ 2009-09-08  7:26 UTC (permalink / raw
  To: ruby-core

Hello Vit,

On 2009/09/07 22:21, Vit Ondruch wrote:
> Issue #2043 has been updated by Vit Ondruch.
>
>
> Thank you for the links. It was interesting.
>
> I'm looking forward Ruby 2.0 and their automatic conversions, since writing c = a.encode('utf-8') + b.encode('utf-8') to safely concatenate two strings is not sexy at all.

It isn't always clear what target encoding you would want when 
concatenating two strings with different encodings.

Also, in other programming languages, you cannot really concatenate 
strings with different encodings. There is either (a) only one encoding 
for all (internal) strings, or (b) concatenation happens on the octet 
level, which (in terms of characters) means that you get a big mess as a 
result.

It is likely that Ruby moves towards (a). It is already rather close to 
(a), in the sense that it's easy to write programs that work like (a). I 
suggest that you have a look at using the (a) model for the program you 
are working on, and see how it works, rather than to wait for Ruby to go 
into a direction that it may as well not choose to go.

Regards,    Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-09-08  7:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-04 14:08 [ruby-core:25360] [Bug #2043] incompatible character encodings Vit Ondruch
2009-09-04 17:12 ` [ruby-core:25369] [Bug #2043](Assigned) " Yui NARUSE
2009-09-07  6:12 ` [ruby-core:25452] [Feature #2043] " Vit Ondruch
2009-09-07  6:22   ` [ruby-core:25453] " Nobuyoshi Nakada
2009-09-07  6:33 ` [ruby-core:25454] " Yui NARUSE
2009-09-07  7:18 ` [ruby-core:25455] " Vit Ondruch
2009-09-07  7:26   ` [ruby-core:25456] " Vincent Isambart
2009-09-07  8:04 ` [ruby-core:25457] " Yui NARUSE
2009-09-07 13:21 ` [ruby-core:25460] " Vit Ondruch
2009-09-08  7:26   ` [ruby-core:25472] " "Martin J. Dürst"

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).