ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:117449] [Ruby master Bug#20412] UTF-8 String encoding behavior differs between 3.2, 3.3 and master
@ 2024-04-06 20:58 bannable (Joe Truba) via ruby-core
  2024-04-07  7:06 ` [ruby-core:117452] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: bannable (Joe Truba) via ruby-core @ 2024-04-06 20:58 UTC (permalink / raw
  To: ruby-core; +Cc: bannable (Joe Truba)

Issue #20412 has been reported by bannable (Joe Truba).

----------------------------------------
Bug #20412: UTF-8 String encoding behavior differs between 3.2, 3.3 and master
https://bugs.ruby-lang.org/issues/20412

* Author: bannable (Joe Truba)
* Status: Open
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When a String that contains only a `\0` byte is mutated by an extension to an invalid UTF-8 sequence, calling `.encode('UTF-8')` does not consistently raise `UndefinedConversionError` across ruby versions. When the string is longer than 1 byte, all versions I've tested correctly raise `UndefinedConversionError`.

For Ruby 3.2, `UndefinedConversionError` being raised appears to depend on where the string was originally allocated.

For Ruby 3.3, `UndefinedConversionError` is never raised.

For master ad90fdd24c, `UndefinedConversionError` is always correctly raised.

I haven't been able to find a bug for this, but it seems like there is a fix in master that should be backported to at least 3.2 and 3.3.

I have not tested 3.1.

The attached reproducer depends on `rbnacl` because it is minimized from a cryptographic project, and I wasn't able to 

## Expected Output

For all versions:
```
$ ruby repro.rb 1
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ruby repro.rb 2
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

## Actual Output

### Ruby 3.2
```
$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 1
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 2
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

### Ruby 3.3
```
$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 1
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 2
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

### Ruby Master
```
$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 1
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 2
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

---Files--------------------------------
repro.rb (2.31 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:117452] [Ruby master Bug#20412] UTF-8 String encoding behavior differs between 3.2, 3.3 and master
  2024-04-06 20:58 [ruby-core:117449] [Ruby master Bug#20412] UTF-8 String encoding behavior differs between 3.2, 3.3 and master bannable (Joe Truba) via ruby-core
@ 2024-04-07  7:06 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-04-08  8:56 ` [ruby-core:117464] " etienne via ruby-core
  2024-04-09  1:16 ` [ruby-core:117467] " bannable (Joe Truba) via ruby-core
  2 siblings, 0 replies; 4+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-04-07  7:06 UTC (permalink / raw
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20412 has been updated by nobu (Nobuyoshi Nakada).


Maybe related to code range cached flags (#19902 ?).

----------------------------------------
Bug #20412: UTF-8 String encoding behavior differs between 3.2, 3.3 and master
https://bugs.ruby-lang.org/issues/20412#change-107836

* Author: bannable (Joe Truba)
* Status: Open
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When a String that contains only a `\0` byte is mutated by an extension to an invalid UTF-8 sequence, calling `.encode('UTF-8')` does not consistently raise `UndefinedConversionError` across ruby versions. When the string is longer than 1 byte, all versions I've tested correctly raise `UndefinedConversionError`.

For Ruby 3.2, `UndefinedConversionError` being raised appears to depend on where the string was originally allocated.

For Ruby 3.3, `UndefinedConversionError` is never raised.

For master ad90fdd24c, `UndefinedConversionError` is always correctly raised.

I haven't been able to find a bug for this, but it seems like there is a fix in master that should be backported to at least 3.2 and 3.3.

I have not tested 3.1.

The attached reproducer depends on `rbnacl` because it is minimized from a cryptographic project, and I wasn't able to reduce it further.

## Expected Output

For all versions:
```
$ ruby repro.rb 1
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ruby repro.rb 2
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

## Actual Output

### Ruby 3.2
```
$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 1
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 2
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

### Ruby 3.3
```
$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 1
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 2
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

### Ruby Master
```
$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 1
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 2
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

---Files--------------------------------
repro.rb (2.31 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:117464] [Ruby master Bug#20412] UTF-8 String encoding behavior differs between 3.2, 3.3 and master
  2024-04-06 20:58 [ruby-core:117449] [Ruby master Bug#20412] UTF-8 String encoding behavior differs between 3.2, 3.3 and master bannable (Joe Truba) via ruby-core
  2024-04-07  7:06 ` [ruby-core:117452] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-04-08  8:56 ` etienne via ruby-core
  2024-04-09  1:16 ` [ruby-core:117467] " bannable (Joe Truba) via ruby-core
  2 siblings, 0 replies; 4+ messages in thread
From: etienne via ruby-core @ 2024-04-08  8:56 UTC (permalink / raw
  To: ruby-core; +Cc: etienne

Issue #20412 has been updated by etienne (Étienne Barrié).


Hey,

I cannot reproduce using the ruby:3.2.3 docker image and with my local installation of Ruby 3.2.3 and 3.2.2.

In all these cases, I get "OK" "is not valid UTF-8". I just changed the script to always use size 1 and bundler/inline:

```ruby
# encoding: ASCII-8BIT
# frozen_string_literal: false

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "rbnacl"
end

p "RUBY: #{RUBY_VERSION}"
require 'rbnacl'

class Encrypter
  extend RbNaCl::Sodium

  sodium_type :stream

  sodium_primitive :xchacha20

  sodium_function :stream_xchacha20_xor,
                  :crypto_stream_xchacha20_xor,
                  %i[pointer pointer ulong_long pointer pointer]

  attr_reader :key

  def initialize(key)
    @key = key
  end

  def encrypt_with_rbnacl_buffer(nonce, message)
    c = RbNaCl::Util.zeros(message.bytesize)
    self.class.stream_xchacha20_xor(c, message, message.bytesize, nonce, key)
    c
  end

  def encrypt_with_local_buffer(nonce, message)
    c = "\0" * message.bytesize
    self.class.stream_xchacha20_xor(c, message, message.bytesize, nonce, key)
    c
  end
end

begin
  "\xC0".encode('UTF-8')
  p 'FAIL: plaintext is not valid UTF-8 and did not error during encoding to UTF-8'
rescue StandardError
end

SIZE = 1

input = ("\xC0" * SIZE) + ' '
nonce = 'B' * 24
key = 'A' * 32

enc = Encrypter.new(key)

ciphertext_rbnacl = enc.encrypt_with_rbnacl_buffer(nonce, input)
ciphertext_local = enc.encrypt_with_local_buffer(nonce, input)

plaintext_rbnacl = enc.encrypt_with_rbnacl_buffer(nonce, ciphertext_rbnacl)
plaintext_local = enc.encrypt_with_local_buffer(nonce, ciphertext_local)

begin
  input.encode('UTF-8')
  p 'FAIL: input is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
end

begin
  ciphertext_rbnacl.encode('UTF-8')
  p 'FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
  p 'OK: ciphertext_rbnacl is not valid UTF-8'
end

begin
  ciphertext_local.encode('UTF-8')
  p 'FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
  p 'OK: ciphertext_local is not valid UTF-8'
end

begin
  plaintext_rbnacl.encode('UTF-8')
  p 'FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
  p 'OK: plaintext_rbnacl is not valid UTF-8'
end

begin
  plaintext_local.encode('UTF-8')
  p 'FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
  p 'OK: plaintext_local is not valid UTF-8'
end
```

Which version of libsodium are you using? Perhaps some specific version mutates a char * string?

----------------------------------------
Bug #20412: UTF-8 String encoding behavior differs between 3.2, 3.3 and master
https://bugs.ruby-lang.org/issues/20412#change-107853

* Author: bannable (Joe Truba)
* Status: Open
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When a String that contains only a `\0` byte is mutated by an extension to an invalid UTF-8 sequence, calling `.encode('UTF-8')` does not consistently raise `UndefinedConversionError` across ruby versions. When the string is longer than 1 byte, all versions I've tested correctly raise `UndefinedConversionError`.

For Ruby 3.2, `UndefinedConversionError` being raised appears to depend on where the string was originally allocated.

For Ruby 3.3, `UndefinedConversionError` is never raised.

For master ad90fdd24c, `UndefinedConversionError` is always correctly raised.

I haven't been able to find a bug for this, but it seems like there is a fix in master that should be backported to at least 3.2 and 3.3.

I have not tested 3.1.

The attached reproducer depends on `rbnacl` because it is minimized from a cryptographic project, and I wasn't able to reduce it further.

## Expected Output

For all versions:
```
$ ruby repro.rb 1
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ruby repro.rb 2
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

## Actual Output

### Ruby 3.2
```
$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 1
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 2
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

### Ruby 3.3
```
$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 1
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 2
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

### Ruby Master
```
$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 1
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 2
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

---Files--------------------------------
repro.rb (2.31 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:117467] [Ruby master Bug#20412] UTF-8 String encoding behavior differs between 3.2, 3.3 and master
  2024-04-06 20:58 [ruby-core:117449] [Ruby master Bug#20412] UTF-8 String encoding behavior differs between 3.2, 3.3 and master bannable (Joe Truba) via ruby-core
  2024-04-07  7:06 ` [ruby-core:117452] " nobu (Nobuyoshi Nakada) via ruby-core
  2024-04-08  8:56 ` [ruby-core:117464] " etienne via ruby-core
@ 2024-04-09  1:16 ` bannable (Joe Truba) via ruby-core
  2 siblings, 0 replies; 4+ messages in thread
From: bannable (Joe Truba) via ruby-core @ 2024-04-09  1:16 UTC (permalink / raw
  To: ruby-core; +Cc: bannable (Joe Truba)

Issue #20412 has been updated by bannable (Joe Truba).


@eti

etienne (Étienne Barrié) wrote in #note-3:
> Hey,
> 
> I cannot reproduce using the ruby:3.2.3 docker image and with my local installation of Ruby 3.2.3 and 3.2.2.
> 
> In all these cases, I get "OK" "is not valid UTF-8". I just changed the script to always use size 1 and bundler/inline:
> 

The input size isn't set correctly after your change to the script:

```
➜  ~ ASDF_RUBY_VERSION=3.2.3 ruby repro.eti.rb 1
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
➜  ~ ASDF_RUBY_VERSION=3.2.3 ruby repro.rb
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"
➜  ~ diff repro.eti.rb repro.rb
44,46c44
< SIZE = ARGV[0].to_i || 32
<
< input = ("\xC0" * SIZE) + ' '
---
> input = "\xC0"
➜  ~
```

----------------------------------------
Bug #20412: UTF-8 String encoding behavior differs between 3.2, 3.3 and master
https://bugs.ruby-lang.org/issues/20412#change-107856

* Author: bannable (Joe Truba)
* Status: Open
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When a String that contains only a `\0` byte is mutated by an extension to an invalid UTF-8 sequence, calling `.encode('UTF-8')` does not consistently raise `UndefinedConversionError` across ruby versions. When the string is longer than 1 byte, all versions I've tested correctly raise `UndefinedConversionError`.

For Ruby 3.2, `UndefinedConversionError` being raised appears to depend on where the string was originally allocated.

For Ruby 3.3, `UndefinedConversionError` is never raised.

For master ad90fdd24c, `UndefinedConversionError` is always correctly raised.

I haven't been able to find a bug for this, but it seems like there is a fix in master that should be backported to at least 3.2 and 3.3.

I have not tested 3.1.

The attached reproducer depends on `rbnacl` because it is minimized from a cryptographic project, and I wasn't able to reduce it further.

## Expected Output

For all versions:
```
$ ruby repro.rb 1
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ruby repro.rb 2
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

## Actual Output

### Ruby 3.2
```
$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 1
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 2
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

### Ruby 3.3
```
$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 1
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 2
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

### Ruby Master
```
$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 1
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 2
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
```

---Files--------------------------------
repro.rb (2.31 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-04-09  1:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-06 20:58 [ruby-core:117449] [Ruby master Bug#20412] UTF-8 String encoding behavior differs between 3.2, 3.3 and master bannable (Joe Truba) via ruby-core
2024-04-07  7:06 ` [ruby-core:117452] " nobu (Nobuyoshi Nakada) via ruby-core
2024-04-08  8:56 ` [ruby-core:117464] " etienne via ruby-core
2024-04-09  1:16 ` [ruby-core:117467] " bannable (Joe Truba) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).