ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized
@ 2021-09-01  8:17 byroot (Jean Boussier)
  2021-09-01 17:57 ` [ruby-core:105107] " byroot (Jean Boussier)
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: byroot (Jean Boussier) @ 2021-09-01  8:17 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been reported by byroot (Jean Boussier).

----------------------------------------
Bug #18141: Marshal load with proc yield strings before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:105107] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized
  2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
@ 2021-09-01 17:57 ` byroot (Jean Boussier)
  2021-09-02  0:40 ` [ruby-core:105113] " nobu (Nobuyoshi Nakada)
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: byroot (Jean Boussier) @ 2021-09-01 17:57 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been updated by byroot (Jean Boussier).


I potentially have a fix: https://github.com/ruby/ruby/pull/4797

----------------------------------------
Bug #18141: Marshal load with proc yield strings before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141#change-93518

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:105113] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized
  2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
  2021-09-01 17:57 ` [ruby-core:105107] " byroot (Jean Boussier)
@ 2021-09-02  0:40 ` nobu (Nobuyoshi Nakada)
  2021-09-17 14:19 ` [ruby-core:105327] " byroot (Jean Boussier)
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: nobu (Nobuyoshi Nakada) @ 2021-09-02  0:40 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been updated by nobu (Nobuyoshi Nakada).


Should use `ruby_bug "#18141", ""..."3.1"` instead of `ruby_version_is`.

----------------------------------------
Bug #18141: Marshal load with proc yield strings before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141#change-93525

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:105327] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized
  2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
  2021-09-01 17:57 ` [ruby-core:105107] " byroot (Jean Boussier)
  2021-09-02  0:40 ` [ruby-core:105113] " nobu (Nobuyoshi Nakada)
@ 2021-09-17 14:19 ` byroot (Jean Boussier)
  2021-09-18  7:34 ` [ruby-core:105336] " nagachika (Tomoyuki Chikanaga)
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: byroot (Jean Boussier) @ 2021-09-17 14:19 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been updated by byroot (Jean Boussier).


So while working on https://bugs.ruby-lang.org/issues/18148, I discovered that many other types of objects are impacted.


Just a few examples:
```ruby
def round_trip(obj, proc = ->(o) { o.freeze })
  Marshal.load(Marshal.dump(obj), proc)
end

h = {}
h.instance_variable_set(:@foo, 42) #<FrozenError: can't modify frozen Hash>
round_trip(h) rescue p $!

a = []
a.instance_variable_set(:@foo, 42) #<FrozenError: can't modify frozen Array>
round_trip(a) rescue p $!
```

Also, probably by design, but since you can replace the oject by what the proc returns:
```ruby
a = {}
a.instance_variable_set(:@foo, 42)
round_trip(a, proc { 24 }) rescue p $! #<FrozenError: can't modify frozen Integer>
```


I fixed most cases in https://github.com/ruby/ruby/pull/4859, which is my current attempt at implementing https://bugs.ruby-lang.org/issues/18148, but since I just noticed this was marked for backport, I might need to split the bug fix from the new feature. No?

----------------------------------------
Bug #18141: Marshal load with proc yield strings before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141#change-93742

* Author: byroot (Jean Boussier)
* Status: Closed
* Priority: Normal
* Backport: 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:105336] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized
  2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
                   ` (2 preceding siblings ...)
  2021-09-17 14:19 ` [ruby-core:105327] " byroot (Jean Boussier)
@ 2021-09-18  7:34 ` nagachika (Tomoyuki Chikanaga)
  2021-09-18 13:45 ` [ruby-core:105342] " byroot (Jean Boussier)
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: nagachika (Tomoyuki Chikanaga) @ 2021-09-18  7:34 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been updated by nagachika (Tomoyuki Chikanaga).


Hello byroot,
Thank you for the investigation about the issue. Yes, the patch with only the bug fix is very helpful to maintain stable branches.

----------------------------------------
Bug #18141: Marshal load with proc yield strings before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141#change-93750

* Author: byroot (Jean Boussier)
* Status: Closed
* Priority: Normal
* Backport: 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:105342] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized
  2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
                   ` (3 preceding siblings ...)
  2021-09-18  7:34 ` [ruby-core:105336] " nagachika (Tomoyuki Chikanaga)
@ 2021-09-18 13:45 ` byroot (Jean Boussier)
  2021-09-28  8:42 ` [ruby-core:105462] [Ruby master Bug#18141] Marshal load with proc yield objects " byroot (Jean Boussier)
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: byroot (Jean Boussier) @ 2021-09-18 13:45 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been updated by byroot (Jean Boussier).


I made a followup patch: https://github.com/ruby/ruby/pull/4866

It now handle similar bugs with `Array`, `Hash` and other mutable objects. It also handle circular objects.

----------------------------------------
Bug #18141: Marshal load with proc yield strings before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141#change-93756

* Author: byroot (Jean Boussier)
* Status: Closed
* Priority: Normal
* Backport: 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:105462] [Ruby master Bug#18141] Marshal load with proc yield objects before they are fully initialized
  2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
                   ` (4 preceding siblings ...)
  2021-09-18 13:45 ` [ruby-core:105342] " byroot (Jean Boussier)
@ 2021-09-28  8:42 ` byroot (Jean Boussier)
  2021-09-30 15:33 ` [ruby-core:105509] " byroot (Jean Boussier)
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: byroot (Jean Boussier) @ 2021-09-28  8:42 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been updated by byroot (Jean Boussier).

Status changed from Closed to Open
Description updated
Subject changed from Marshal load with proc yield strings before they are fully initialized  to Marshal load with proc yield objects before they are fully initialized 

I took the liberty to re-open this issue and to rewrite it to be more generic. 

I wonder if it wouldn't be simpler to revert the string only fix (https://github.com/ruby/ruby/pull/4797), and then to merge the more general one (https://github.com/ruby/ruby/pull/4866), this way it would be simpler to backport.

Any opinions?

----------------------------------------
Bug #18141: Marshal load with proc yield objects before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141#change-93912

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
* Backport: 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.

The same happens with any instance variable on `Array` or `Hash`

```ruby
foo = {}
foo.instance_variable_set(:@bar, 42)

payload = Marshal.dump(foo)

object = Marshal.load(payload, ->(obj) {
  if obj.is_a?(Hash)
    p [obj, obj.instance_variable_get(:@bar)]
    obj.freeze
  end
  obj
})
```

```
[{}, nil]
/tmp/marshal.rb:6:in `load': can't modify frozen Hash: {} (FrozenError)
	from /tmp/marshal.rb:6:in `<main>
```



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:105509] [Ruby master Bug#18141] Marshal load with proc yield objects before they are fully initialized
  2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
                   ` (5 preceding siblings ...)
  2021-09-28  8:42 ` [ruby-core:105462] [Ruby master Bug#18141] Marshal load with proc yield objects " byroot (Jean Boussier)
@ 2021-09-30 15:33 ` byroot (Jean Boussier)
  2021-10-09  6:37 ` [ruby-core:105611] " nagachika (Tomoyuki Chikanaga)
  2021-11-24 10:36 ` [ruby-core:106261] " usa (Usaku NAKAMURA)
  8 siblings, 0 replies; 10+ messages in thread
From: byroot (Jean Boussier) @ 2021-09-30 15:33 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been updated by byroot (Jean Boussier).

Status changed from Open to Closed

https://github.com/ruby/ruby/pull/4866 was merged as 529fc204af84f825f98f83c34b004acbaa802615, closing.

----------------------------------------
Bug #18141: Marshal load with proc yield objects before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141#change-93957

* Author: byroot (Jean Boussier)
* Status: Closed
* Priority: Normal
* Backport: 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.

The same happens with any instance variable on `Array` or `Hash`

```ruby
foo = {}
foo.instance_variable_set(:@bar, 42)

payload = Marshal.dump(foo)

object = Marshal.load(payload, ->(obj) {
  if obj.is_a?(Hash)
    p [obj, obj.instance_variable_get(:@bar)]
    obj.freeze
  end
  obj
})
```

```
[{}, nil]
/tmp/marshal.rb:6:in `load': can't modify frozen Hash: {} (FrozenError)
	from /tmp/marshal.rb:6:in `<main>
```



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:105611] [Ruby master Bug#18141] Marshal load with proc yield objects before they are fully initialized
  2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
                   ` (6 preceding siblings ...)
  2021-09-30 15:33 ` [ruby-core:105509] " byroot (Jean Boussier)
@ 2021-10-09  6:37 ` nagachika (Tomoyuki Chikanaga)
  2021-11-24 10:36 ` [ruby-core:106261] " usa (Usaku NAKAMURA)
  8 siblings, 0 replies; 10+ messages in thread
From: nagachika (Tomoyuki Chikanaga) @ 2021-10-09  6:37 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been updated by nagachika (Tomoyuki Chikanaga).

Backport changed from 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED to 2.6: REQUIRED, 2.7: REQUIRED, 3.0: DONE

ruby_3_0 fe9d33beb78d5c7932a5c2ca3953045c0ae751d5 merged revision(s) 89242279e61b023a81c58065c62a82de8829d0b3,529fc204af84f825f98f83c34b004acbaa802615.

----------------------------------------
Bug #18141: Marshal load with proc yield objects before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141#change-94100

* Author: byroot (Jean Boussier)
* Status: Closed
* Priority: Normal
* Backport: 2.6: REQUIRED, 2.7: REQUIRED, 3.0: DONE
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.

The same happens with any instance variable on `Array` or `Hash`

```ruby
foo = {}
foo.instance_variable_set(:@bar, 42)

payload = Marshal.dump(foo)

object = Marshal.load(payload, ->(obj) {
  if obj.is_a?(Hash)
    p [obj, obj.instance_variable_get(:@bar)]
    obj.freeze
  end
  obj
})
```

```
[{}, nil]
/tmp/marshal.rb:6:in `load': can't modify frozen Hash: {} (FrozenError)
	from /tmp/marshal.rb:6:in `<main>
```



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:106261] [Ruby master Bug#18141] Marshal load with proc yield objects before they are fully initialized
  2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
                   ` (7 preceding siblings ...)
  2021-10-09  6:37 ` [ruby-core:105611] " nagachika (Tomoyuki Chikanaga)
@ 2021-11-24 10:36 ` usa (Usaku NAKAMURA)
  8 siblings, 0 replies; 10+ messages in thread
From: usa (Usaku NAKAMURA) @ 2021-11-24 10:36 UTC (permalink / raw)
  To: ruby-core

Issue #18141 has been updated by usa (Usaku NAKAMURA).

Backport changed from 2.6: REQUIRED, 2.7: REQUIRED, 3.0: DONE to 2.6: REQUIRED, 2.7: DONE, 3.0: DONE

ruby_2_7 419266d44c54c6b75f1e824f060c8b388f7a405b merged revision(s) 89242279e61b023a81c58065c62a82de8829d0b3,529fc204af84f825f98f83c34b004acbaa802615.

----------------------------------------
Bug #18141: Marshal load with proc yield objects before they are fully initialized 
https://bugs.ruby-lang.org/issues/18141#change-94881

* Author: byroot (Jean Boussier)
* Status: Closed
* Priority: Normal
* Backport: 2.6: REQUIRED, 2.7: DONE, 3.0: DONE
----------------------------------------
I assume this is a bug because I can't find any spec or test for this behaviour:

Consider the following script:
```ruby
payload = Marshal.dump("foo")

Marshal.load(payload, -> (obj) {
  if obj.is_a?(String)
    p [obj, obj.encoding]
  end
  obj
})
p [:final, string, string.encoding]
```

outputs:
```ruby
["foo", #<Encoding:ASCII-8BIT>]
[:final, "foo", #<Encoding:UTF-8>]
```

So `Marshal` call the proc before the string get its encoding assigned, this is because the encoding is stored alongside as a `TYPE_IVAR`. I think in such cases `Marshal` should delay calling the proc until the object is fully restored.

A corollary to this behaviour is that the following code:

```ruby
Marshal.load(payload, :freeze.to_proc)
```

raises with `can't modify frozen String: "foo" (FrozenError)`.

The same happens with any instance variable on `Array` or `Hash`

```ruby
foo = {}
foo.instance_variable_set(:@bar, 42)

payload = Marshal.dump(foo)

object = Marshal.load(payload, ->(obj) {
  if obj.is_a?(Hash)
    p [obj, obj.instance_variable_get(:@bar)]
    obj.freeze
  end
  obj
})
```

```
[{}, nil]
/tmp/marshal.rb:6:in `load': can't modify frozen Hash: {} (FrozenError)
	from /tmp/marshal.rb:6:in `<main>
```



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-24 10:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01  8:17 [ruby-core:105104] [Ruby master Bug#18141] Marshal load with proc yield strings before they are fully initialized byroot (Jean Boussier)
2021-09-01 17:57 ` [ruby-core:105107] " byroot (Jean Boussier)
2021-09-02  0:40 ` [ruby-core:105113] " nobu (Nobuyoshi Nakada)
2021-09-17 14:19 ` [ruby-core:105327] " byroot (Jean Boussier)
2021-09-18  7:34 ` [ruby-core:105336] " nagachika (Tomoyuki Chikanaga)
2021-09-18 13:45 ` [ruby-core:105342] " byroot (Jean Boussier)
2021-09-28  8:42 ` [ruby-core:105462] [Ruby master Bug#18141] Marshal load with proc yield objects " byroot (Jean Boussier)
2021-09-30 15:33 ` [ruby-core:105509] " byroot (Jean Boussier)
2021-10-09  6:37 ` [ruby-core:105611] " nagachika (Tomoyuki Chikanaga)
2021-11-24 10:36 ` [ruby-core:106261] " usa (Usaku NAKAMURA)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).