ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:66514] [ruby-trunk - Feature #10552] [Open] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
@ 2014-11-27  7:59 ` plasticchicken
  2014-11-27 15:03 ` [ruby-core:66532] [ruby-trunk - Feature #10552] " me
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: plasticchicken @ 2014-11-27  7:59 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been reported by Brian Hempel.

----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:66532] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
  2014-11-27  7:59 ` [ruby-core:66514] [ruby-trunk - Feature #10552] [Open] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies plasticchicken
@ 2014-11-27 15:03 ` me
  2014-11-27 19:29 ` [ruby-core:66534] " plasticchicken
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: me @ 2014-11-27 15:03 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been updated by David Workman.


I like this idea, but I think it could be improved by allowing .frequencies to take a block and it will count the frequencies of the return value of the block, similar to .all?, .any? and .none?

This would allow the frequencies method to be useful not just on arrays of strings but also able to be used on more complex data structures without having to do a .map first to massage data into the desired format first.

----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50147

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:66534] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
  2014-11-27  7:59 ` [ruby-core:66514] [ruby-trunk - Feature #10552] [Open] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies plasticchicken
  2014-11-27 15:03 ` [ruby-core:66532] [ruby-trunk - Feature #10552] " me
@ 2014-11-27 19:29 ` plasticchicken
  2014-11-28  7:15 ` [ruby-core:66540] " duerst
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: plasticchicken @ 2014-11-27 19:29 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been updated by Brian Hempel.


Thanks for the feedback David. I can see a `map` functionality being useful, but here I will play some arguments against integrating `map`:

1. I was thinking the block could be reserved because in the future it might be nice to change the weighting: some elements might count as 1, but others are less important so each of them only counts as 0.5. However, I can't think of a good use case for that yet.
2.  `any?` `all?` and `none?` return booleans, not collections. All of the other enumerable methods that return a collection return elements from the original enumerable. For example, `my_enum.group_by(&:relation)` has elements from `my_enum` in the hash value arrays. It's a small code smell `my_enum.frequencies(&:relation)` would return a potentially large collection that contains nothing from `my_enum`.
3. `any?` `all?` and `none?` can exit early, so there's a performance improvement to `.any?(&:finished?)` compared to `.map(&:finished?).any?`. There would be little performance improvement here because `frequencies` always has to walk the entire collection.

On the other hand, there is one good argument for integrating `map`:

1. `Enumerable#count` takes a block to specify what to count, and `frequencies` is basically `count`, but on all elements at once.


----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50149

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:66540] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2014-11-27 19:29 ` [ruby-core:66534] " plasticchicken
@ 2014-11-28  7:15 ` duerst
  2014-11-28  8:38 ` [ruby-core:66548] " plasticchicken
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: duerst @ 2014-11-28  7:15 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been updated by Martin Dürst.


frequencies is essentially a group_by with the values mapped with size/count.

So assuming something like issue #9970 or issue #7793 gets accepted, it could simply be written as
%w[cat bird bird horse].group_by {|x| x}.map_values {|v| v.count }
or, if we get an identity method (*), as:
%w[cat bird bird horse].group_by(&:identity).map_values &:count

While this may not be very short, it's a concise description of what actually happens. I think it would be better for Ruby to improve how such general transformations can be written, rather than add more and more specialized methods methods such as (relative_)frequency. Such methods better would go into a statistics package (see 10228; would be good to have, too, of course.)

(*) I thought we had an issue for this, but couldn't find it.


----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50155

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:66548] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2014-11-28  7:15 ` [ruby-core:66540] " duerst
@ 2014-11-28  8:38 ` plasticchicken
  2014-11-29  3:04 ` [ruby-core:66560] " duerst
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: plasticchicken @ 2014-11-28  8:38 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been updated by Brian Hempel.


Yes, I would rather see `Hash#map_values` in Ruby before `Enumerable#frequencies`. However, if both `map_values` and `frequencies` were added, then we might not need `relative_frequencies`, since calculating it becomes cleaner:

~~~ruby
array = %w[cat bird bird horse]
array.frequencies.map_values { |n| n.to_f / array.size }
~~~

I think that counting everything up is more like pre-statistics. I want to count things more often than I want to take a mean or a standard deviation. Also, most statistical measures operate only on collections of numbers. In contrast, counting frequencies works on collections of anything, not just numbers.

We could call the method `counts` instead of `frequencies` to make it sound less like statistics and more like counting.

To revise your example in favor of this patch: if you want the frequencies sorted, the "manual way" becomes longer:

~~~ruby
%w[cat bird bird horse].group_by(&:identity).map_values(&:count).sort_by(&:last).reverse.to_h
~~~



----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50164

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:66560] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2014-11-28  8:38 ` [ruby-core:66548] " plasticchicken
@ 2014-11-29  3:04 ` duerst
  2014-11-29  3:04 ` [ruby-core:66562] " duerst
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: duerst @ 2014-11-29  3:04 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been updated by Martin Dürst.

Related to Feature #9970: Add `Hash#map_keys` and `Hash#map_values` added

----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50173

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:66562] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2014-11-29  3:04 ` [ruby-core:66560] " duerst
@ 2014-11-29  3:04 ` duerst
  2014-11-29  3:04 ` [ruby-core:66565] " duerst
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: duerst @ 2014-11-29  3:04 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been updated by Martin Dürst.

Related to Feature #7793: New methods on Hash added

----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50175

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:66565] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2014-11-29  3:04 ` [ruby-core:66562] " duerst
@ 2014-11-29  3:04 ` duerst
  2014-11-30  3:57 ` [ruby-core:66582] " andrewm.bpi
  2014-11-30 11:56 ` [ruby-core:66591] " shevegen
  9 siblings, 0 replies; 10+ messages in thread
From: duerst @ 2014-11-29  3:04 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been updated by Martin Dürst.

Related to Feature #10228: Statistics module added

----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50177

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:66582] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2014-11-29  3:04 ` [ruby-core:66565] " duerst
@ 2014-11-30  3:57 ` andrewm.bpi
  2014-11-30 11:56 ` [ruby-core:66591] " shevegen
  9 siblings, 0 replies; 10+ messages in thread
From: andrewm.bpi @ 2014-11-30  3:57 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been updated by Andrew M.


Personally, I'd prefer the form `Enumerable#count_by` with a block, as this method seems very similar to `group_by` in my opinion.

----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50197

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:66591] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
       [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2014-11-30  3:57 ` [ruby-core:66582] " andrewm.bpi
@ 2014-11-30 11:56 ` shevegen
  9 siblings, 0 replies; 10+ messages in thread
From: shevegen @ 2014-11-30 11:56 UTC (permalink / raw
  To: ruby-core

Issue #10552 has been updated by Robert A. Heiler.


I like the word .frequencies - it seems nicer than each_with_object(Hash.new(0)) and also 
than group_by.

I do not like the word .relative_frequencies but I can understand why you want this - it
seems more a subpart of statistics though, and would perhaps be better placed into some 
extension into ruby (either into math, or perhaps statistics, which could be a 
subproject of ruby math).

On a side-note, perhaps we can also improve on ruby statistics functionality a bit. I'd
rather use Ruby than R, speed difference is no issue for me, but Ruby is much nicer to 
work with than R.

----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50207

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-11-30 12:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-10552.20141127075905@ruby-lang.org>
2014-11-27  7:59 ` [ruby-core:66514] [ruby-trunk - Feature #10552] [Open] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies plasticchicken
2014-11-27 15:03 ` [ruby-core:66532] [ruby-trunk - Feature #10552] " me
2014-11-27 19:29 ` [ruby-core:66534] " plasticchicken
2014-11-28  7:15 ` [ruby-core:66540] " duerst
2014-11-28  8:38 ` [ruby-core:66548] " plasticchicken
2014-11-29  3:04 ` [ruby-core:66560] " duerst
2014-11-29  3:04 ` [ruby-core:66562] " duerst
2014-11-29  3:04 ` [ruby-core:66565] " duerst
2014-11-30  3:57 ` [ruby-core:66582] " andrewm.bpi
2014-11-30 11:56 ` [ruby-core:66591] " shevegen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).