ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:89696] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
@ 2018-11-03 13:08 ` RedGreenBlueDiamond
  2018-11-03 15:51 ` [ruby-core:89697] " ruby-core
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: RedGreenBlueDiamond @ 2018-11-03 13:08 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been reported by RGBD (Oleg Zubchenko).

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281

* Author: RGBD (Oleg Zubchenko)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s1.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:89697] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
  2018-11-03 13:08 ` [ruby-core:89696] [Ruby trunk Feature#15281] Speed up Set#intersect with size check RedGreenBlueDiamond
@ 2018-11-03 15:51 ` ruby-core
  2018-11-03 17:10 ` [ruby-core:89699] " RedGreenBlueDiamond
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: ruby-core @ 2018-11-03 15:51 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by marcandre (Marc-Andre Lafortune).


Thanks for the patch.

Sadly, I don't think we can do that as `Set`s are ordered. That optimization would change the order of the resulting `Set` in some cases.

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-74740

* Author: RGBD (Oleg Zubchenko)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:89699] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
  2018-11-03 13:08 ` [ruby-core:89696] [Ruby trunk Feature#15281] Speed up Set#intersect with size check RedGreenBlueDiamond
  2018-11-03 15:51 ` [ruby-core:89697] " ruby-core
@ 2018-11-03 17:10 ` RedGreenBlueDiamond
  2018-11-03 18:42 ` [ruby-core:89700] " ruby-core
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: RedGreenBlueDiamond @ 2018-11-03 17:10 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by RGBD (Oleg Zubchenko).


marcandre (Marc-Andre Lafortune) wrote:
> Thanks for the patch.
> 
> Sadly, I don't think we can do that as `Set`s are ordered. That optimization would change the order of the resulting `Set` in some cases.

Where in the documentation can I read that?

[this](https://ruby-doc.org/stdlib-2.5.3/libdoc/set/rdoc/Set.html#method-i-to_a) says order is unspecified.
Top of the page states:
> Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-74742

* Author: RGBD (Oleg Zubchenko)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:89700] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2018-11-03 17:10 ` [ruby-core:89699] " RedGreenBlueDiamond
@ 2018-11-03 18:42 ` ruby-core
  2018-11-04  0:08 ` [ruby-core:89704] " duerst
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: ruby-core @ 2018-11-03 18:42 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by marcandre (Marc-Andre Lafortune).

Assignee set to matz (Yukihiro Matsumoto)

RGBD (Oleg Zubchenko) wrote:
> > Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.

Good point.

Note that this documentation is 15 years old and predates Hash being ordered.

I'd be enclined to say that Set should be officially ordered, even if "mathematically" speaking that doesn't make sense. I'm assigning this to Matz.

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-74743

* Author: RGBD (Oleg Zubchenko)
* Status: Open
* Priority: Normal
* Assignee: matz (Yukihiro Matsumoto)
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:89704] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2018-11-03 18:42 ` [ruby-core:89700] " ruby-core
@ 2018-11-04  0:08 ` duerst
  2018-11-06 20:35 ` [ruby-core:89733] " pdahorek
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: duerst @ 2018-11-04  0:08 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by duerst (Martin Dürst).


marcandre (Marc-Andre Lafortune) wrote:

> I'd be enclined to say that Set should be officially ordered, even if "mathematically" speaking that doesn't make sense. I'm assigning this to Matz.

I'd be inclined to say that Set should be officially UNordered, because they are mathematically unordered.

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-74747

* Author: RGBD (Oleg Zubchenko)
* Status: Open
* Priority: Normal
* Assignee: matz (Yukihiro Matsumoto)
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:89733] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2018-11-04  0:08 ` [ruby-core:89704] " duerst
@ 2018-11-06 20:35 ` pdahorek
  2019-04-27 20:01 ` [ruby-core:92446] " eregontp
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: pdahorek @ 2018-11-06 20:35 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by ahorek (Pavel Rosický).


> we can do that as Sets are ordered

IMO if they are, it's an implementation detail that you shouldn't rely on. Also there's more room to optimize unordered sets.

if you need an ordered (or indexed?) set, it should be a subclass that implements this behaviour on the top of the generic unordered set. Something like https://ruby-doc.org/stdlib-2.5.3/libdoc/set/rdoc/SortedSet.html

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-74778

* Author: RGBD (Oleg Zubchenko)
* Status: Open
* Priority: Normal
* Assignee: matz (Yukihiro Matsumoto)
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:92446] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2018-11-06 20:35 ` [ruby-core:89733] " pdahorek
@ 2019-04-27 20:01 ` eregontp
  2019-05-11 13:40 ` [ruby-core:92622] " nobu
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: eregontp @ 2019-04-27 20:01 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by Eregon (Benoit Daloze).


I think making Set unordered would be a big breaking change, similar to making Hash unordered (as it was in 1.8).
Maybe people forgot, but I find it pretty bad to work with unordered Hashes,
e.g., every iteration method like `each` yields a confusing order changing on potentially every mutation.

I'm also fairly confident there is code out there relying on Set ordering.

This proposal doesn't propose to make Set unordered though, but it would introduce non-determinism in the order of elements returned by Set#&.
That seems not ideal.

Also, this trivial optimization can already be done at the application level if needed, and there the change of ordering is obvious and conscious:
```ruby
set1, set2 = ...
if set1.size < set2.size
  set1 & set2
else
  set2 & set1
end
```

So I don't think it's worth introducing non-determinism in Set for something that can be expressed so simply in application code, with the advantage of the ordering trade-off being clear if done in application code.

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-77797

* Author: RGBD (Oleg Zubchenko)
* Status: Open
* Priority: Normal
* Assignee: matz (Yukihiro Matsumoto)
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:92622] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2019-04-27 20:01 ` [ruby-core:92446] " eregontp
@ 2019-05-11 13:40 ` nobu
  2019-05-21  2:52 ` [ruby-core:92736] " knu
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: nobu @ 2019-05-11 13:40 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by nobu (Nobuyoshi Nakada).

Assignee changed from matz (Yukihiro Matsumoto) to knu (Akinori MUSHA)

The author of set.rb is knu.

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-77982

* Author: RGBD (Oleg Zubchenko)
* Status: Open
* Priority: Normal
* Assignee: knu (Akinori MUSHA)
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:92736] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2019-05-11 13:40 ` [ruby-core:92622] " nobu
@ 2019-05-21  2:52 ` knu
  2019-05-21  8:25 ` [ruby-core:92740] " sawadatsuyoshi
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 12+ messages in thread
From: knu @ 2019-05-21  2:52 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by knu (Akinori MUSHA).


It is clearly stated that Set is an unordered collection and it doesn't even guarantee any determinacy about the order of elements, and as you know, it's just it happened to become ordered and deterministic when the underlying data structure (Hash) got an order.

That being said, I can understand the need for an ordered set if there are already some users using Set as such.  We could probably introduce OrderedSet for easier migration before making this "breaking" change (and maybe some others to follow) to Set, I guess?

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-78096

* Author: RGBD (Oleg Zubchenko)
* Status: Assigned
* Priority: Normal
* Assignee: knu (Akinori MUSHA)
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:92740] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2019-05-21  2:52 ` [ruby-core:92736] " knu
@ 2019-05-21  8:25 ` sawadatsuyoshi
  2019-05-21  9:42 ` [ruby-core:92742] " chris
  2019-05-21 13:09 ` [ruby-core:92743] " eregontp
  11 siblings, 0 replies; 12+ messages in thread
From: sawadatsuyoshi @ 2019-05-21  8:25 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by sawa (Tsuyoshi Sawada).


There is a mathematical term "ordered set", which means a different thing: an algebra in which the elements are given intrinsic order.

The concept in question should be named `Tuple`.

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-78102

* Author: RGBD (Oleg Zubchenko)
* Status: Assigned
* Priority: Normal
* Assignee: knu (Akinori MUSHA)
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:92742] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
                   ` (9 preceding siblings ...)
  2019-05-21  8:25 ` [ruby-core:92740] " sawadatsuyoshi
@ 2019-05-21  9:42 ` chris
  2019-05-21 13:09 ` [ruby-core:92743] " eregontp
  11 siblings, 0 replies; 12+ messages in thread
From: chris @ 2019-05-21  9:42 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by chrisseaton (Chris Seaton).


I'm not sure it should be called `Tuple` - to me that implies that elements can appear more than once - that it's a multi-set - and this isn't the case.

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-78104

* Author: RGBD (Oleg Zubchenko)
* Status: Assigned
* Priority: Normal
* Assignee: knu (Akinori MUSHA)
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:92743] [Ruby trunk Feature#15281] Speed up Set#intersect with size check.
       [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
                   ` (10 preceding siblings ...)
  2019-05-21  9:42 ` [ruby-core:92742] " chris
@ 2019-05-21 13:09 ` eregontp
  11 siblings, 0 replies; 12+ messages in thread
From: eregontp @ 2019-05-21 13:09 UTC (permalink / raw)
  To: ruby-core

Issue #15281 has been updated by Eregon (Benoit Daloze).


Just a note, for compatibility there is often a gap between what the documentation states and what Ruby programs assume.
So even though the documentation says it's unordered, I would guess there are programs relying on Sets being ordered since Ruby 1.9.

Maybe not so many programs, but I think we should evaluate before changing.

----------------------------------------
Feature #15281: Speed up Set#intersect with size check.
https://bugs.ruby-lang.org/issues/15281#change-78105

* Author: RGBD (Oleg Zubchenko)
* Status: Assigned
* Priority: Normal
* Assignee: knu (Akinori MUSHA)
* Target version: 
----------------------------------------
Current implementation computes set intersection s1 & s2 in O(s2.size) time.
It can be reduced to O([s1.size, s2.size].min) time.

Additional speedup comes from using #each instead of #do_with_enum.

See files attached for benchmarks.

[Pull Request](https://github.com/ruby/ruby/pull/2003)

P.S. using benchmark-ips gem

---Files--------------------------------
intersect.rb (1.91 KB)
intersect_standalone.rb (671 Bytes)
benchmark_results.txt (3.68 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-05-21 13:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-15281.20181103130832@ruby-lang.org>
2018-11-03 13:08 ` [ruby-core:89696] [Ruby trunk Feature#15281] Speed up Set#intersect with size check RedGreenBlueDiamond
2018-11-03 15:51 ` [ruby-core:89697] " ruby-core
2018-11-03 17:10 ` [ruby-core:89699] " RedGreenBlueDiamond
2018-11-03 18:42 ` [ruby-core:89700] " ruby-core
2018-11-04  0:08 ` [ruby-core:89704] " duerst
2018-11-06 20:35 ` [ruby-core:89733] " pdahorek
2019-04-27 20:01 ` [ruby-core:92446] " eregontp
2019-05-11 13:40 ` [ruby-core:92622] " nobu
2019-05-21  2:52 ` [ruby-core:92736] " knu
2019-05-21  8:25 ` [ruby-core:92740] " sawadatsuyoshi
2019-05-21  9:42 ` [ruby-core:92742] " chris
2019-05-21 13:09 ` [ruby-core:92743] " eregontp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).