ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:94114] [Ruby master Feature#16038] Provide a public WeakMap which compare by equality rather than identity
       [not found] <redmine.issue-16038.20190802165249@ruby-lang.org>
@ 2019-08-02 16:52 ` jean.boussier
  2019-08-29  6:45 ` [ruby-core:94651] [Ruby master Feature#16038] Provide a public WeakMap that compares by equality rather than by identity matz
  2019-08-29 10:12 ` [ruby-core:94663] " jean.boussier
  2 siblings, 0 replies; 3+ messages in thread
From: jean.boussier @ 2019-08-02 16:52 UTC (permalink / raw
  To: ruby-core

Issue #16038 has been reported by byroot (Jean Boussier).

----------------------------------------
Feature #16038: Provide a public WeakMap which compare by equality rather than identity 
https://bugs.ruby-lang.org/issues/16038

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I know `ObjectSpace::WeakMap` isn't really supposed to be used, and that the blessed interface is `WeakRef`, however I'd like to make a case for a better public WeakMap.

### Usage

As described in [Feature #16035], `WeakMap` are useful for deduplicating "value objects". A typical use case is as follow:

```ruby
class Position
  REGISTRY = {}
  private_constant :REGISTRY

  class << self
    def new(*)
      instance = super
      REGISTRY[instance] ||= instance
    end
  end

  attr_reader :x, :y, :z

  def initialize(x, y, z)
    @x = x
    @y = y
    @z = z
    freeze
  end

  def hash
    self.class.hash ^
      x.hash >> 1 ^
      y.hash >> 2 ^
      y.hash >> 3
  end

  def ==(other)
    other.is_a?(Position) &&
      other.x == x &&
      other.y == y &&
      other.z == z
  end
  alias_method :eql?, :==
end

p Position.new(1, 2, 3).equal?(Position.new(1, 2, 3))
```

That's pretty much the pattern [I used in Rails to deduplicate database metadata and save lots of memory](https://github.com/rails/rails/blob/f3c68c59ed57302ca54f4dfad0e91dbff426962d/activerecord/lib/active_record/connection_adapters/deduplicable.rb).

The big downside here is that these value objects can't be GCed anymore, so this pattern is not viable in many case.

### Why not use WeakRef

A couple reasons, first when using this pattern the goal is to reduce memory usage, so having one extra `WeakRef` for every single value object is a bit counter productive. 

Then it's a bit annoying to work with, as you have to constantly check wether the reference is still alive, and/or rescue `WeakRef::RefError`.

Often, these two complications make the tradeoff not worth it.

### Ruby 2.7

Since [Feature #13498] `WeakMap` is a bit more usable as you can now use an interned string as unique key, e.g.

```ruby
class Position
  REGISTRY = ObjectSpace::WeakMap.new
  private_constant :REGISTRY

  class << self
    def new(*)
      instance = super
      REGISTRY[instance.unique_id] ||= instance
    end
  end

  attr_reader :x, :y, :z, :unique_id

  def initialize(x, y, z)
    @x = x
    @y = y
    @z = z
    @unique_id = -"#{self.class}-#{x},#{y},#{z}"
    freeze
  end

  def hash
    self.class.hash ^
      x.hash >> 1 ^
      y.hash >> 2 ^
      y.hash >> 3
  end

  def ==(other)
    other.is_a?(Position) &&
      other.x == x &&
      other.y == y &&
      other.z == z
  end
  alias_method :eql?, :==
end

p Position.new(1, 2, 3).equal?(Position.new(1, 2, 3))
```

That makes the pattern much easier to work with than dealing with `WeakRef`, but there is still that an extra instance.

### Proposal

What would be ideal would be a `WeakMap` that works by equality, so that the first snippet could simply replace `{}` by `WeakMap.new`. 

Since changing `ObjectSpace::WeakMap` behavior would cause issues, I can see two possiblities:

  - The best IMO would be to have a new top level `::WeakMap` be the equality based map, and `ObjectSpace::WeakMap` would remain as a semi-private interface for backing up `WeakRef`.
  - Or alternatively `ObjectSpace::WeakMap` could have a `compare_by_equality` method (inverse of `Hash#compare_by_identity`) to change it's behavior post instantiation.

I personally prefer the first one.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ruby-core:94651] [Ruby master Feature#16038] Provide a public WeakMap that compares by equality rather than by identity
       [not found] <redmine.issue-16038.20190802165249@ruby-lang.org>
  2019-08-02 16:52 ` [ruby-core:94114] [Ruby master Feature#16038] Provide a public WeakMap which compare by equality rather than identity jean.boussier
@ 2019-08-29  6:45 ` matz
  2019-08-29 10:12 ` [ruby-core:94663] " jean.boussier
  2 siblings, 0 replies; 3+ messages in thread
From: matz @ 2019-08-29  6:45 UTC (permalink / raw
  To: ruby-core

Issue #16038 has been updated by matz (Yukihiro Matsumoto).


I am not sure if the proposal has real-world use-case. Can you elaborate?

Matz.


----------------------------------------
Feature #16038: Provide a public WeakMap that compares by equality rather than by identity 
https://bugs.ruby-lang.org/issues/16038#change-81250

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I know `ObjectSpace::WeakMap` isn't really supposed to be used, and that the blessed interface is `WeakRef`. However, I'd like to make a case for a better public WeakMap.

### Usage

As described in [Feature #16035], `WeakMap` is useful for deduplicating "value objects". A typical use case is as follows:

```ruby
class Position
  REGISTRY = {}
  private_constant :REGISTRY

  class << self
    def new(*)
      instance = super
      REGISTRY[instance] ||= instance
    end
  end

  attr_reader :x, :y, :z

  def initialize(x, y, z)
    @x = x
    @y = y
    @z = z
    freeze
  end

  def hash
    self.class.hash ^
      x.hash >> 1 ^
      y.hash >> 2 ^
      y.hash >> 3
  end

  def ==(other)
    other.is_a?(Position) &&
      other.x == x &&
      other.y == y &&
      other.z == z
  end
  alias_method :eql?, :==
end

p Position.new(1, 2, 3).equal?(Position.new(1, 2, 3))
```

That's pretty much the pattern [I used in Rails to deduplicate database metadata and save lots of memory](https://github.com/rails/rails/blob/f3c68c59ed57302ca54f4dfad0e91dbff426962d/activerecord/lib/active_record/connection_adapters/deduplicable.rb).

The big downside here is that these value objects can't be GCed anymore, so this pattern is not viable in many case.

### Why not use WeakRef

A couple of reasons. First, when using this pattern, the goal is to reduce memory usage, so having one extra `WeakRef` for every single value object is a bit counter productive. 

Then it's a bit annoying to work with, as you have to constantly check wether the reference is still alive, and/or rescue `WeakRef::RefError`.

Often, these two complications make the tradeoff not worth it.

### Ruby 2.7

Since [Feature #13498] `WeakMap` is a bit more usable as you can now use an interned string as the unique key, e.g.

```ruby
class Position
  REGISTRY = ObjectSpace::WeakMap.new
  private_constant :REGISTRY

  class << self
    def new(*)
      instance = super
      REGISTRY[instance.unique_id] ||= instance
    end
  end

  attr_reader :x, :y, :z, :unique_id

  def initialize(x, y, z)
    @x = x
    @y = y
    @z = z
    @unique_id = -"#{self.class}-#{x},#{y},#{z}"
    freeze
  end

  def hash
    self.class.hash ^
      x.hash >> 1 ^
      y.hash >> 2 ^
      y.hash >> 3
  end

  def ==(other)
    other.is_a?(Position) &&
      other.x == x &&
      other.y == y &&
      other.z == z
  end
  alias_method :eql?, :==
end

p Position.new(1, 2, 3).equal?(Position.new(1, 2, 3))
```

That makes the pattern much easier to work with than dealing with `WeakRef`, but there is still that an extra instance.

### Proposal

What would be ideal would be a `WeakMap` that works by equality, so that the first snippet could simply replace `{}` by `WeakMap.new`. 

Changing `ObjectSpace::WeakMap`'s behavior would cause issues, and I see two possibilities:

  - The best IMO would be to have a new top level `::WeakMap` be the equality based map, and have `ObjectSpace::WeakMap` remain as a semi-private interface for backing up `WeakRef`.
  - Or alternatively, `ObjectSpace::WeakMap` could have a `compare_by_equality` method (inverse of `Hash#compare_by_identity`) to change its behavior post instantiation.

I personally prefer the first one.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ruby-core:94663] [Ruby master Feature#16038] Provide a public WeakMap that compares by equality rather than by identity
       [not found] <redmine.issue-16038.20190802165249@ruby-lang.org>
  2019-08-02 16:52 ` [ruby-core:94114] [Ruby master Feature#16038] Provide a public WeakMap which compare by equality rather than identity jean.boussier
  2019-08-29  6:45 ` [ruby-core:94651] [Ruby master Feature#16038] Provide a public WeakMap that compares by equality rather than by identity matz
@ 2019-08-29 10:12 ` jean.boussier
  2 siblings, 0 replies; 3+ messages in thread
From: jean.boussier @ 2019-08-29 10:12 UTC (permalink / raw
  To: ruby-core

Issue #16038 has been updated by byroot (Jean Boussier).


@matz Of course. What prompted me to open this feature request is a feature I implemented in Rails where I use this exact pattern: https://github.com/rails/rails/pull/35891 (more specifically https://github.com/rails/rails/blob/48ae1ba36132d7deec0bd43ee50e076786011a5e/activerecord/lib/active_record/connection_adapters/deduplicable.rb)

In short Rails ORM keep a data structure in memory that reflect the database schema, and that structure contains quite a lot of duplications, for instance all tables have an `id BIGINT PRIMARY KEY` column, so if you have 100 tables, you'll end up with 100 identical `Column` objects in memory. 

By using the pattern described in this issue, I was able to get rid of all the deduplications, which resulted in massive memory saving (over 115MB for us, but granted our schema is massive).

But my worry now is that I made all these objects totally non garbage collectable, so if someone is instantiating these dynamically, it might cause a RAM based DOS vulnerability, an attacker could trigger a lot of instantiations until the application runs out of RAM. 

That is why I'd like to use a WeakMap instead, but for that I need it to have an equality semantic like a regular hash. If it existed, I believe this pattern could be used in many libraries and applications which can't do it today because it's too risky.

----------------------------------------
Feature #16038: Provide a public WeakMap that compares by equality rather than by identity 
https://bugs.ruby-lang.org/issues/16038#change-81266

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I know `ObjectSpace::WeakMap` isn't really supposed to be used, and that the blessed interface is `WeakRef`. However, I'd like to make a case for a better public WeakMap.

### Usage

As described in [Feature #16035], `WeakMap` is useful for deduplicating "value objects". A typical use case is as follows:

```ruby
class Position
  REGISTRY = {}
  private_constant :REGISTRY

  class << self
    def new(*)
      instance = super
      REGISTRY[instance] ||= instance
    end
  end

  attr_reader :x, :y, :z

  def initialize(x, y, z)
    @x = x
    @y = y
    @z = z
    freeze
  end

  def hash
    self.class.hash ^
      x.hash >> 1 ^
      y.hash >> 2 ^
      y.hash >> 3
  end

  def ==(other)
    other.is_a?(Position) &&
      other.x == x &&
      other.y == y &&
      other.z == z
  end
  alias_method :eql?, :==
end

p Position.new(1, 2, 3).equal?(Position.new(1, 2, 3))
```

That's pretty much the pattern [I used in Rails to deduplicate database metadata and save lots of memory](https://github.com/rails/rails/blob/f3c68c59ed57302ca54f4dfad0e91dbff426962d/activerecord/lib/active_record/connection_adapters/deduplicable.rb).

The big downside here is that these value objects can't be GCed anymore, so this pattern is not viable in many case.

### Why not use WeakRef

A couple of reasons. First, when using this pattern, the goal is to reduce memory usage, so having one extra `WeakRef` for every single value object is a bit counter productive. 

Then it's a bit annoying to work with, as you have to constantly check wether the reference is still alive, and/or rescue `WeakRef::RefError`.

Often, these two complications make the tradeoff not worth it.

### Ruby 2.7

Since [Feature #13498] `WeakMap` is a bit more usable as you can now use an interned string as the unique key, e.g.

```ruby
class Position
  REGISTRY = ObjectSpace::WeakMap.new
  private_constant :REGISTRY

  class << self
    def new(*)
      instance = super
      REGISTRY[instance.unique_id] ||= instance
    end
  end

  attr_reader :x, :y, :z, :unique_id

  def initialize(x, y, z)
    @x = x
    @y = y
    @z = z
    @unique_id = -"#{self.class}-#{x},#{y},#{z}"
    freeze
  end

  def hash
    self.class.hash ^
      x.hash >> 1 ^
      y.hash >> 2 ^
      y.hash >> 3
  end

  def ==(other)
    other.is_a?(Position) &&
      other.x == x &&
      other.y == y &&
      other.z == z
  end
  alias_method :eql?, :==
end

p Position.new(1, 2, 3).equal?(Position.new(1, 2, 3))
```

That makes the pattern much easier to work with than dealing with `WeakRef`, but there is still that an extra instance.

### Proposal

What would be ideal would be a `WeakMap` that works by equality, so that the first snippet could simply replace `{}` by `WeakMap.new`. 

Changing `ObjectSpace::WeakMap`'s behavior would cause issues, and I see two possibilities:

  - The best IMO would be to have a new top level `::WeakMap` be the equality based map, and have `ObjectSpace::WeakMap` remain as a semi-private interface for backing up `WeakRef`.
  - Or alternatively, `ObjectSpace::WeakMap` could have a `compare_by_equality` method (inverse of `Hash#compare_by_identity`) to change its behavior post instantiation.

I personally prefer the first one.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-08-29 10:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-16038.20190802165249@ruby-lang.org>
2019-08-02 16:52 ` [ruby-core:94114] [Ruby master Feature#16038] Provide a public WeakMap which compare by equality rather than identity jean.boussier
2019-08-29  6:45 ` [ruby-core:94651] [Ruby master Feature#16038] Provide a public WeakMap that compares by equality rather than by identity matz
2019-08-29 10:12 ` [ruby-core:94663] " jean.boussier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).