ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:91424] [Ruby trunk Feature#15590] Add dups to Array to find duplicates
       [not found] <redmine.issue-15590.20190206113123@ruby-lang.org>
@ 2019-02-06 11:31 ` xdmx
  2019-02-06 12:52 ` [ruby-core:91427] " shevegen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: xdmx @ 2019-02-06 11:31 UTC (permalink / raw
  To: ruby-core

Issue #15590 has been reported by xdmx (Eric Bloom).

----------------------------------------
Feature #15590: Add dups to Array to find duplicates
https://bugs.ruby-lang.org/issues/15590

* Author: xdmx (Eric Bloom)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Many times I find myself debugging data and the need of finding duplicated values inside of an Array.

Based on the amount of data it could be a simple `array.detect { |value| array.count(value) > 1 }` or a more performant way like

```ruby
def dups_for(array)
  duplicated_values = []
  tmp = {}
  array.each do |value|
    duplicated_values << value if tmp[value]
    tmp[value] = true
  end
  duplicated_values
end
```
 
It would be awesome if there was a way directly from the core language to call `dups` (or another name, as it could be too similar to the current `dup`) on an array in order to get all the duplicated values.

I'd love to create a PR for this, but my C level is non-existent 😞




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:91427] [Ruby trunk Feature#15590] Add dups to Array to find duplicates
       [not found] <redmine.issue-15590.20190206113123@ruby-lang.org>
  2019-02-06 11:31 ` [ruby-core:91424] [Ruby trunk Feature#15590] Add dups to Array to find duplicates xdmx
@ 2019-02-06 12:52 ` shevegen
  2019-02-06 12:54 ` [ruby-core:91428] " shevegen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: shevegen @ 2019-02-06 12:52 UTC (permalink / raw
  To: ruby-core

Issue #15590 has been updated by shevegen (Robert A. Heiler).


I am not entirely sure whether I understood the proposal or the code example.

What do you mean with duplicated values in an Array? Do you mean something
"reversed" such as a complementary method to .uniq (Array#uniq)? Or is the
suggestion related to Object#dup? https://ruby-doc.org/core-2.6.1/Object.html#method-i-dup

I assume that you more refer to a complementary method to .uniq but I am not
completely sure, so I hope it's ok for you to clarify on that just to make sure, when
you have some time.

(We may also have to look at the chosen name for the method; I am not sure if
.dups would be an acceptable method due to potential confusion.)

----------------------------------------
Feature #15590: Add dups to Array to find duplicates
https://bugs.ruby-lang.org/issues/15590#change-76686

* Author: xdmx (Eric Bloom)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Many times I find myself debugging data and the need of finding duplicated values inside of an Array.

Based on the amount of data it could be a simple `array.detect { |value| array.count(value) > 1 }` or a more performant way like

```ruby
def dups_for(array)
  duplicated_values = []
  tmp = {}
  array.each do |value|
    duplicated_values << value if tmp[value]
    tmp[value] = true
  end
  duplicated_values
end
```
 
It would be awesome if there was a way directly from the core language to call `dups` (or another name, as it could be too similar to the current `dup`) on an array in order to get all the duplicated values.

I'd love to create a PR for this, but my C level is non-existent 😞




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:91428] [Ruby trunk Feature#15590] Add dups to Array to find duplicates
       [not found] <redmine.issue-15590.20190206113123@ruby-lang.org>
  2019-02-06 11:31 ` [ruby-core:91424] [Ruby trunk Feature#15590] Add dups to Array to find duplicates xdmx
  2019-02-06 12:52 ` [ruby-core:91427] " shevegen
@ 2019-02-06 12:54 ` shevegen
  2019-02-06 13:07 ` [ruby-core:91430] " mame
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: shevegen @ 2019-02-06 12:54 UTC (permalink / raw
  To: ruby-core

Issue #15590 has been updated by shevegen (Robert A. Heiler).


After re-reading, I think you may refer more to a method such as:

    .duplicates?

on class Array, right?

If this is the case then I understand your example and proposal and
I am slightly in favour (if it is meant as a complementary method to
.uniq; at the least I remember that I had to do this a few times to
detect the duplicate entries, e. g. faulty files that may keep track of
dependencies for programs to compile, and had twice the same content
in the same .yml file; I am sure others may have had somewhat
similar use cases here and there - but again, right now I am not 
100% sure if this is what Eric suggested actually).

----------------------------------------
Feature #15590: Add dups to Array to find duplicates
https://bugs.ruby-lang.org/issues/15590#change-76687

* Author: xdmx (Eric Bloom)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Many times I find myself debugging data and the need of finding duplicated values inside of an Array.

Based on the amount of data it could be a simple `array.detect { |value| array.count(value) > 1 }` or a more performant way like

```ruby
def dups_for(array)
  duplicated_values = []
  tmp = {}
  array.each do |value|
    duplicated_values << value if tmp[value]
    tmp[value] = true
  end
  duplicated_values
end
```
 
It would be awesome if there was a way directly from the core language to call `dups` (or another name, as it could be too similar to the current `dup`) on an array in order to get all the duplicated values.

I'd love to create a PR for this, but my C level is non-existent 😞




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:91430] [Ruby trunk Feature#15590] Add dups to Array to find duplicates
       [not found] <redmine.issue-15590.20190206113123@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2019-02-06 12:54 ` [ruby-core:91428] " shevegen
@ 2019-02-06 13:07 ` mame
  2019-02-06 13:49 ` [ruby-core:91433] " xdmx
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: mame @ 2019-02-06 13:07 UTC (permalink / raw
  To: ruby-core

Issue #15590 has been updated by mame (Yusuke Endoh).


> Many times I find myself debugging data and the need of finding duplicated values inside of an Array

Could you elaborate the use case?

----------------------------------------
Feature #15590: Add dups to Array to find duplicates
https://bugs.ruby-lang.org/issues/15590#change-76689

* Author: xdmx (Eric Bloom)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Many times I find myself debugging data and the need of finding duplicated values inside of an Array.

Based on the amount of data it could be a simple `array.detect { |value| array.count(value) > 1 }` or a more performant way like

```ruby
def dups_for(array)
  duplicated_values = []
  tmp = {}
  array.each do |value|
    duplicated_values << value if tmp[value]
    tmp[value] = true
  end
  duplicated_values
end
```
 
It would be awesome if there was a way directly from the core language to call `dups` (or another name, as it could be too similar to the current `dup`) on an array in order to get all the duplicated values.

I'd love to create a PR for this, but my C level is non-existent 😞




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:91433] [Ruby trunk Feature#15590] Add dups to Array to find duplicates
       [not found] <redmine.issue-15590.20190206113123@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2019-02-06 13:07 ` [ruby-core:91430] " mame
@ 2019-02-06 13:49 ` xdmx
  2019-02-06 14:20 ` [ruby-core:91434] " tad.a.digger
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: xdmx @ 2019-02-06 13:49 UTC (permalink / raw
  To: ruby-core

Issue #15590 has been updated by xdmx (Eric Bloom).


> I assume that you more refer to a complementary method to .uniq but I am not
> completely sure, so I hope it's ok for you to clarify on that just to make sure, when
> you have some time.

Sorry for not having included an example!

Yes, I mean it as a complementary method of `uniq`, and actually `duplicates` would be much better than `dups` :)

> Could you elaborate the use case?

The use case is mostly: you have a list of data, which could be a list of ids, names, codes, or others and you want to know which ones of them are duplicated in the array.

So for example, you have a list of cities: `["Tokyo", "Paris", "London", "Miami", "Paris", "Orlando", "Dubai", "Tokyo", "Paris"]` and it includes some duplicated values (`Tokyo` and `Paris`), and you want to find out which ones are duplicated. This would be the same for ids, or other values. If you increase the list to hundreds of values, it'd be harder to find it by just looking at the list :)

As the result I'd probably expect the list each duplicated values (`["Paris", "Tokyo", "Paris"]`) instead of a uniq version of them (`["Paris", "Tokyo"]`).

I personally do it many times to check data in the database or from other sources (csv, json) to discover duplicated records with the same name, code, or other values, especially while cleaning up legacy data and where there were no previous constraints/checks.

----------------------------------------
Feature #15590: Add dups to Array to find duplicates
https://bugs.ruby-lang.org/issues/15590#change-76692

* Author: xdmx (Eric Bloom)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Many times I find myself debugging data and the need of finding duplicated values inside of an Array.

Based on the amount of data it could be a simple `array.detect { |value| array.count(value) > 1 }` or a more performant way like

```ruby
def dups_for(array)
  duplicated_values = []
  tmp = {}
  array.each do |value|
    duplicated_values << value if tmp[value]
    tmp[value] = true
  end
  duplicated_values
end
```
 
It would be awesome if there was a way directly from the core language to call `dups` (or another name, as it could be too similar to the current `dup`) on an array in order to get all the duplicated values.

I'd love to create a PR for this, but my C level is non-existent 😞




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:91434] [Ruby trunk Feature#15590] Add dups to Array to find duplicates
       [not found] <redmine.issue-15590.20190206113123@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2019-02-06 13:49 ` [ruby-core:91433] " xdmx
@ 2019-02-06 14:20 ` tad.a.digger
  2019-02-08  9:19 ` [ruby-core:91490] " sawadatsuyoshi
  2019-02-08 10:00 ` [ruby-core:91491] " nobu
  7 siblings, 0 replies; 8+ messages in thread
From: tad.a.digger @ 2019-02-06 14:20 UTC (permalink / raw
  To: ruby-core

Issue #15590 has been updated by tad (Tadashi Saito).


How about `Set#add?` ?

```ruby
require 'set'

a = ["Tokyo", "Paris", "London", "Miami", "Paris", "Orlando", "Dubai", "Tokyo", "Paris"]
s = Set.new
p a.select{|e| !s.add?(e)} #=> ["Paris", "Tokyo", "Paris"]
```

----------------------------------------
Feature #15590: Add dups to Array to find duplicates
https://bugs.ruby-lang.org/issues/15590#change-76693

* Author: xdmx (Eric Bloom)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Many times I find myself debugging data and the need of finding duplicated values inside of an Array.

Based on the amount of data it could be a simple `array.detect { |value| array.count(value) > 1 }` or a more performant way like

```ruby
def dups_for(array)
  duplicated_values = []
  tmp = {}
  array.each do |value|
    duplicated_values << value if tmp[value]
    tmp[value] = true
  end
  duplicated_values
end
```
 
It would be awesome if there was a way directly from the core language to call `dups` (or another name, as it could be too similar to the current `dup`) on an array in order to get all the duplicated values.

I'd love to create a PR for this, but my C level is non-existent 😞




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:91490] [Ruby trunk Feature#15590] Add dups to Array to find duplicates
       [not found] <redmine.issue-15590.20190206113123@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2019-02-06 14:20 ` [ruby-core:91434] " tad.a.digger
@ 2019-02-08  9:19 ` sawadatsuyoshi
  2019-02-08 10:00 ` [ruby-core:91491] " nobu
  7 siblings, 0 replies; 8+ messages in thread
From: sawadatsuyoshi @ 2019-02-08  9:19 UTC (permalink / raw
  To: ruby-core

Issue #15590 has been updated by sawa (Tsuyoshi Sawada).


With the newly introduced `tally`, you can also do:

```ruby
a = ["Tokyo", "Paris", "London", "Miami", "Paris", "Orlando", "Dubai", "Tokyo", "Paris"]

a.tally(&:itself).flat_map{|k, v| Array.new(v - 1, k)}
#=> ["Tokyo", "Paris", "Paris"]
```

----------------------------------------
Feature #15590: Add dups to Array to find duplicates
https://bugs.ruby-lang.org/issues/15590#change-76750

* Author: xdmx (Eric Bloom)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Many times I find myself debugging data and the need of finding duplicated values inside of an Array.

Based on the amount of data it could be a simple `array.detect { |value| array.count(value) > 1 }` or a more performant way like

```ruby
def dups_for(array)
  duplicated_values = []
  tmp = {}
  array.each do |value|
    duplicated_values << value if tmp[value]
    tmp[value] = true
  end
  duplicated_values
end
```
 
It would be awesome if there was a way directly from the core language to call `dups` (or another name, as it could be too similar to the current `dup`) on an array in order to get all the duplicated values.

I'd love to create a PR for this, but my C level is non-existent 😞




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:91491] [Ruby trunk Feature#15590] Add dups to Array to find duplicates
       [not found] <redmine.issue-15590.20190206113123@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2019-02-08  9:19 ` [ruby-core:91490] " sawadatsuyoshi
@ 2019-02-08 10:00 ` nobu
  7 siblings, 0 replies; 8+ messages in thread
From: nobu @ 2019-02-08 10:00 UTC (permalink / raw
  To: ruby-core

Issue #15590 has been updated by nobu (Nobuyoshi Nakada).


sawa (Tsuyoshi Sawada) wrote:
> ```ruby
> a.tally(&:itself).flat_map{|k, v| Array.new(v - 1, k)}
> ```

As `tally` does not take a block, `&:itself` is not used.
It's a mistake in the rdoc.


----------------------------------------
Feature #15590: Add dups to Array to find duplicates
https://bugs.ruby-lang.org/issues/15590#change-76752

* Author: xdmx (Eric Bloom)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Many times I find myself debugging data and the need of finding duplicated values inside of an Array.

Based on the amount of data it could be a simple `array.detect { |value| array.count(value) > 1 }` or a more performant way like

```ruby
def dups_for(array)
  duplicated_values = []
  tmp = {}
  array.each do |value|
    duplicated_values << value if tmp[value]
    tmp[value] = true
  end
  duplicated_values
end
```
 
It would be awesome if there was a way directly from the core language to call `dups` (or another name, as it could be too similar to the current `dup`) on an array in order to get all the duplicated values.

I'd love to create a PR for this, but my C level is non-existent 😞




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-08 10:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-15590.20190206113123@ruby-lang.org>
2019-02-06 11:31 ` [ruby-core:91424] [Ruby trunk Feature#15590] Add dups to Array to find duplicates xdmx
2019-02-06 12:52 ` [ruby-core:91427] " shevegen
2019-02-06 12:54 ` [ruby-core:91428] " shevegen
2019-02-06 13:07 ` [ruby-core:91430] " mame
2019-02-06 13:49 ` [ruby-core:91433] " xdmx
2019-02-06 14:20 ` [ruby-core:91434] " tad.a.digger
2019-02-08  9:19 ` [ruby-core:91490] " sawadatsuyoshi
2019-02-08 10:00 ` [ruby-core:91491] " nobu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).