ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:55709] [ruby-trunk - Bug #8585][Open] Time for CSV.generate grows quadratic with number of rows
@ 2013-06-30 10:04 peter_v (Peter Vandenabeele)
  2013-06-30 10:38 ` [ruby-core:55710] [ruby-trunk - Bug #8585] " peter_v (Peter Vandenabeele)
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: peter_v (Peter Vandenabeele) @ 2013-06-30 10:04 UTC (permalink / raw
  To: ruby-core


Issue #8585 has been reported by peter_v (Peter Vandenabeele).

----------------------------------------
Bug #8585: Time for CSV.generate grows quadratic with number of rows
https://bugs.ruby-lang.org/issues/8585

Author: peter_v (Peter Vandenabeele)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.1.0dev and 2.0.0
Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN


Hi,

I want to generate a CSV string, from millions of rows.
I see the time to create the string grows quadratic
with the amount of rows. With this issue, I cannot use
ruby 2.0.0 to create the CSV file.

I did not see this problem was not present in ruby 1.9.3.

I see the problem is present in ruby 2.0.0 and ruby-head.

Using ruby-head
===============

Installed  with `rvm reinstall ruby-head`  (built from version 3a01b9e) 

peter_v@peter64:~/p/dbd$ rvm use ruby-head
Using /home/peter_v/.rvm/gems/ruby-head

peter_v@peter64:~/p/dbd$ ruby -v
ruby 2.1.0dev (2013-06-30) [x86_64-linux]

peter_v@peter64:~/p/dbd$ uname -a
Linux peter64 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

peter_v@peter64:~/p/dbd$ rvm current
ruby-head

peter_v@peter64:~/p/dbd$ cat bin/test_4.rb 
#!/usr/bin/env ruby

count = ARGV[0].to_i
unless count > 0
  puts "Give a 'count' as first argument."
  exit(1)
end

require 'csv'

row_data = [
  "59ffbb3b-1e48-4c1f-81d8-d93afc84c966",
  "2013-06-28 19:14:55.975000806 UTC",
  "a11f290e-c441-41bc-8b8c-4e6c27b1b6fc",
  "c73e6241-d46f-4952-8377-c11372346d15",
  "test",
  "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0"]

puts "starting CSV.generate"
start_time = Time.now

csv_string = CSV.generate(force_quotes: true) do |csv|
  count.times do
    csv << row_data
  end
end

puts "CSV.generate took #{Time.now - start_time} seconds"

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 1.01238478 seconds

real	0m1.045s
user	0m1.044s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 3.815373614 seconds

real	0m3.847s
user	0m3.844s
sys	0m0.000s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 17.176208859 seconds

real	0m17.212s
user	0m17.177s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 71.400916725 seconds

real	1m11.436s
user	1m11.320s
sys	0m0.036s
peter_v@peter64:~/p/dbd$ 


Using ruby-1.9.3-p448
=====================

This is as expected LINEAR growth of time with number of rows.

peter_v@peter64:~/p/dbd$ rvm use ruby-1.9.3
Using /home/peter_v/.rvm/gems/ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ ruby -v
ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-linux]

peter_v@peter64:~/p/dbd$ rvm current
ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 0.125396387 seconds

real	0m0.150s
user	0m0.140s
sys	0m0.008s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 0.249746069 seconds

real	0m0.274s
user	0m0.268s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 0.498180989 seconds

real	0m0.522s
user	0m0.504s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 0.991481147 seconds

real	0m1.015s
user	0m1.000s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 100_000
starting CSV.generate
CSV.generate took 1.243347153 seconds

real	0m1.265s
user	0m1.240s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 1_000_000
starting CSV.generate
CSV.generate took 12.461711974 seconds

real	0m12.492s
user	0m12.405s
sys	0m0.080s
peter_v@peter64:~/p/dbd$ 



-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:55710] [ruby-trunk - Bug #8585] Time for CSV.generate grows quadratic with number of rows
  2013-06-30 10:04 [ruby-core:55709] [ruby-trunk - Bug #8585][Open] Time for CSV.generate grows quadratic with number of rows peter_v (Peter Vandenabeele)
@ 2013-06-30 10:38 ` peter_v (Peter Vandenabeele)
  2013-06-30 14:00 ` [ruby-core:55714] " Eregon (Benoit Daloze)
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: peter_v (Peter Vandenabeele) @ 2013-06-30 10:38 UTC (permalink / raw
  To: ruby-core


Issue #8585 has been updated by peter_v (Peter Vandenabeele).


Using

CSV.open(filename, 'w') 

I can write large CSV files to disk in Ruby 2.0.0
(e.g. 10 M rows in 132 seconds)

It is only writing it to string that forms a problem in
ruby 2.0.0 and ruby-head.

----------------------------------------
Bug #8585: Time for CSV.generate grows quadratic with number of rows
https://bugs.ruby-lang.org/issues/8585#change-40204

Author: peter_v (Peter Vandenabeele)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.1.0dev and 2.0.0
Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN


Hi,

I want to generate a CSV string, from millions of rows.
I see the time to create the string grows quadratic
with the amount of rows. With this issue, I cannot use
ruby 2.0.0 to create the CSV file.

I did not see this problem was not present in ruby 1.9.3.

I see the problem is present in ruby 2.0.0 and ruby-head.

Using ruby-head
===============

Installed  with `rvm reinstall ruby-head`  (built from version 3a01b9e) 

peter_v@peter64:~/p/dbd$ rvm use ruby-head
Using /home/peter_v/.rvm/gems/ruby-head

peter_v@peter64:~/p/dbd$ ruby -v
ruby 2.1.0dev (2013-06-30) [x86_64-linux]

peter_v@peter64:~/p/dbd$ uname -a
Linux peter64 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

peter_v@peter64:~/p/dbd$ rvm current
ruby-head

peter_v@peter64:~/p/dbd$ cat bin/test_4.rb 
#!/usr/bin/env ruby

count = ARGV[0].to_i
unless count > 0
  puts "Give a 'count' as first argument."
  exit(1)
end

require 'csv'

row_data = [
  "59ffbb3b-1e48-4c1f-81d8-d93afc84c966",
  "2013-06-28 19:14:55.975000806 UTC",
  "a11f290e-c441-41bc-8b8c-4e6c27b1b6fc",
  "c73e6241-d46f-4952-8377-c11372346d15",
  "test",
  "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0"]

puts "starting CSV.generate"
start_time = Time.now

csv_string = CSV.generate(force_quotes: true) do |csv|
  count.times do
    csv << row_data
  end
end

puts "CSV.generate took #{Time.now - start_time} seconds"

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 1.01238478 seconds

real	0m1.045s
user	0m1.044s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 3.815373614 seconds

real	0m3.847s
user	0m3.844s
sys	0m0.000s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 17.176208859 seconds

real	0m17.212s
user	0m17.177s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 71.400916725 seconds

real	1m11.436s
user	1m11.320s
sys	0m0.036s
peter_v@peter64:~/p/dbd$ 


Using ruby-1.9.3-p448
=====================

This is as expected LINEAR growth of time with number of rows.

peter_v@peter64:~/p/dbd$ rvm use ruby-1.9.3
Using /home/peter_v/.rvm/gems/ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ ruby -v
ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-linux]

peter_v@peter64:~/p/dbd$ rvm current
ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 0.125396387 seconds

real	0m0.150s
user	0m0.140s
sys	0m0.008s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 0.249746069 seconds

real	0m0.274s
user	0m0.268s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 0.498180989 seconds

real	0m0.522s
user	0m0.504s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 0.991481147 seconds

real	0m1.015s
user	0m1.000s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 100_000
starting CSV.generate
CSV.generate took 1.243347153 seconds

real	0m1.265s
user	0m1.240s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 1_000_000
starting CSV.generate
CSV.generate took 12.461711974 seconds

real	0m12.492s
user	0m12.405s
sys	0m0.080s
peter_v@peter64:~/p/dbd$ 



-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:55714] [ruby-trunk - Bug #8585] Time for CSV.generate grows quadratic with number of rows
  2013-06-30 10:04 [ruby-core:55709] [ruby-trunk - Bug #8585][Open] Time for CSV.generate grows quadratic with number of rows peter_v (Peter Vandenabeele)
  2013-06-30 10:38 ` [ruby-core:55710] [ruby-trunk - Bug #8585] " peter_v (Peter Vandenabeele)
@ 2013-06-30 14:00 ` Eregon (Benoit Daloze)
  2013-06-30 14:04 ` [ruby-core:55715] " Eregon (Benoit Daloze)
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Eregon (Benoit Daloze) @ 2013-06-30 14:00 UTC (permalink / raw
  To: ruby-core


Issue #8585 has been updated by Eregon (Benoit Daloze).


Good find!

A git bisect led to r37485 aka 58ef0f06:

Author: naruse
Date:   Tue Nov 6 00:49:57 2012 +0000

    * ruby.c (load_file_internal): set default source encoding as
      UTF-8 instead of US-ASCII. [ruby-core:46021] [Feature #6679]

    * parse.y (parser_initialize): set default parser encoding as
      UTF-8 instead of US-ASCII.

So definitely looks encoding-related.
And worrying this is causing such performance regression.
----------------------------------------
Bug #8585: Time for CSV.generate grows quadratic with number of rows
https://bugs.ruby-lang.org/issues/8585#change-40208

Author: peter_v (Peter Vandenabeele)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.1.0dev and 2.0.0
Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN


Hi,

I want to generate a CSV string, from millions of rows.
I see the time to create the string grows quadratic
with the amount of rows. With this issue, I cannot use
ruby 2.0.0 to create the CSV file.

I did not see this problem was not present in ruby 1.9.3.

I see the problem is present in ruby 2.0.0 and ruby-head.

Using ruby-head
===============

Installed  with `rvm reinstall ruby-head`  (built from version 3a01b9e) 

peter_v@peter64:~/p/dbd$ rvm use ruby-head
Using /home/peter_v/.rvm/gems/ruby-head

peter_v@peter64:~/p/dbd$ ruby -v
ruby 2.1.0dev (2013-06-30) [x86_64-linux]

peter_v@peter64:~/p/dbd$ uname -a
Linux peter64 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

peter_v@peter64:~/p/dbd$ rvm current
ruby-head

peter_v@peter64:~/p/dbd$ cat bin/test_4.rb 
#!/usr/bin/env ruby

count = ARGV[0].to_i
unless count > 0
  puts "Give a 'count' as first argument."
  exit(1)
end

require 'csv'

row_data = [
  "59ffbb3b-1e48-4c1f-81d8-d93afc84c966",
  "2013-06-28 19:14:55.975000806 UTC",
  "a11f290e-c441-41bc-8b8c-4e6c27b1b6fc",
  "c73e6241-d46f-4952-8377-c11372346d15",
  "test",
  "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0"]

puts "starting CSV.generate"
start_time = Time.now

csv_string = CSV.generate(force_quotes: true) do |csv|
  count.times do
    csv << row_data
  end
end

puts "CSV.generate took #{Time.now - start_time} seconds"

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 1.01238478 seconds

real	0m1.045s
user	0m1.044s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 3.815373614 seconds

real	0m3.847s
user	0m3.844s
sys	0m0.000s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 17.176208859 seconds

real	0m17.212s
user	0m17.177s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 71.400916725 seconds

real	1m11.436s
user	1m11.320s
sys	0m0.036s
peter_v@peter64:~/p/dbd$ 


Using ruby-1.9.3-p448
=====================

This is as expected LINEAR growth of time with number of rows.

peter_v@peter64:~/p/dbd$ rvm use ruby-1.9.3
Using /home/peter_v/.rvm/gems/ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ ruby -v
ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-linux]

peter_v@peter64:~/p/dbd$ rvm current
ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 0.125396387 seconds

real	0m0.150s
user	0m0.140s
sys	0m0.008s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 0.249746069 seconds

real	0m0.274s
user	0m0.268s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 0.498180989 seconds

real	0m0.522s
user	0m0.504s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 0.991481147 seconds

real	0m1.015s
user	0m1.000s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 100_000
starting CSV.generate
CSV.generate took 1.243347153 seconds

real	0m1.265s
user	0m1.240s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 1_000_000
starting CSV.generate
CSV.generate took 12.461711974 seconds

real	0m12.492s
user	0m12.405s
sys	0m0.080s
peter_v@peter64:~/p/dbd$ 



-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:55715] [ruby-trunk - Bug #8585] Time for CSV.generate grows quadratic with number of rows
  2013-06-30 10:04 [ruby-core:55709] [ruby-trunk - Bug #8585][Open] Time for CSV.generate grows quadratic with number of rows peter_v (Peter Vandenabeele)
  2013-06-30 10:38 ` [ruby-core:55710] [ruby-trunk - Bug #8585] " peter_v (Peter Vandenabeele)
  2013-06-30 14:00 ` [ruby-core:55714] " Eregon (Benoit Daloze)
@ 2013-06-30 14:04 ` Eregon (Benoit Daloze)
  2013-06-30 14:15 ` [ruby-core:55716] " charliesome (Charlie Somerville)
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Eregon (Benoit Daloze) @ 2013-06-30 14:04 UTC (permalink / raw
  To: ruby-core


Issue #8585 has been updated by Eregon (Benoit Daloze).


Adding "# encoding: US-ASCII" at the top of the script makes it identical to the previous behavior, therefore taking the same time. I would certainly not call this a solution though.
----------------------------------------
Bug #8585: Time for CSV.generate grows quadratic with number of rows
https://bugs.ruby-lang.org/issues/8585#change-40209

Author: peter_v (Peter Vandenabeele)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.1.0dev and 2.0.0
Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN


Hi,

I want to generate a CSV string, from millions of rows.
I see the time to create the string grows quadratic
with the amount of rows. With this issue, I cannot use
ruby 2.0.0 to create the CSV file.

I did not see this problem was not present in ruby 1.9.3.

I see the problem is present in ruby 2.0.0 and ruby-head.

Using ruby-head
===============

Installed  with `rvm reinstall ruby-head`  (built from version 3a01b9e) 

peter_v@peter64:~/p/dbd$ rvm use ruby-head
Using /home/peter_v/.rvm/gems/ruby-head

peter_v@peter64:~/p/dbd$ ruby -v
ruby 2.1.0dev (2013-06-30) [x86_64-linux]

peter_v@peter64:~/p/dbd$ uname -a
Linux peter64 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

peter_v@peter64:~/p/dbd$ rvm current
ruby-head

peter_v@peter64:~/p/dbd$ cat bin/test_4.rb 
#!/usr/bin/env ruby

count = ARGV[0].to_i
unless count > 0
  puts "Give a 'count' as first argument."
  exit(1)
end

require 'csv'

row_data = [
  "59ffbb3b-1e48-4c1f-81d8-d93afc84c966",
  "2013-06-28 19:14:55.975000806 UTC",
  "a11f290e-c441-41bc-8b8c-4e6c27b1b6fc",
  "c73e6241-d46f-4952-8377-c11372346d15",
  "test",
  "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0"]

puts "starting CSV.generate"
start_time = Time.now

csv_string = CSV.generate(force_quotes: true) do |csv|
  count.times do
    csv << row_data
  end
end

puts "CSV.generate took #{Time.now - start_time} seconds"

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 1.01238478 seconds

real	0m1.045s
user	0m1.044s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 3.815373614 seconds

real	0m3.847s
user	0m3.844s
sys	0m0.000s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 17.176208859 seconds

real	0m17.212s
user	0m17.177s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 71.400916725 seconds

real	1m11.436s
user	1m11.320s
sys	0m0.036s
peter_v@peter64:~/p/dbd$ 


Using ruby-1.9.3-p448
=====================

This is as expected LINEAR growth of time with number of rows.

peter_v@peter64:~/p/dbd$ rvm use ruby-1.9.3
Using /home/peter_v/.rvm/gems/ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ ruby -v
ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-linux]

peter_v@peter64:~/p/dbd$ rvm current
ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 0.125396387 seconds

real	0m0.150s
user	0m0.140s
sys	0m0.008s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 0.249746069 seconds

real	0m0.274s
user	0m0.268s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 0.498180989 seconds

real	0m0.522s
user	0m0.504s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 0.991481147 seconds

real	0m1.015s
user	0m1.000s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 100_000
starting CSV.generate
CSV.generate took 1.243347153 seconds

real	0m1.265s
user	0m1.240s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 1_000_000
starting CSV.generate
CSV.generate took 12.461711974 seconds

real	0m12.492s
user	0m12.405s
sys	0m0.080s
peter_v@peter64:~/p/dbd$ 



-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:55716] [ruby-trunk - Bug #8585] Time for CSV.generate grows quadratic with number of rows
  2013-06-30 10:04 [ruby-core:55709] [ruby-trunk - Bug #8585][Open] Time for CSV.generate grows quadratic with number of rows peter_v (Peter Vandenabeele)
                   ` (2 preceding siblings ...)
  2013-06-30 14:04 ` [ruby-core:55715] " Eregon (Benoit Daloze)
@ 2013-06-30 14:15 ` charliesome (Charlie Somerville)
  2013-06-30 14:38 ` [ruby-core:55717] " nobu (Nobuyoshi Nakada)
  2013-06-30 16:56 ` [ruby-core:55718] " Eregon (Benoit Daloze)
  5 siblings, 0 replies; 7+ messages in thread
From: charliesome (Charlie Somerville) @ 2013-06-30 14:15 UTC (permalink / raw
  To: ruby-core


Issue #8585 has been updated by charliesome (Charlie Somerville).


This is most likely due to character indexing in UTF-8 being O(n).

I'd suggest reworking CSV.generate to not use character indexing, or convert input strings to UTF-32 first.
----------------------------------------
Bug #8585: Time for CSV.generate grows quadratic with number of rows
https://bugs.ruby-lang.org/issues/8585#change-40210

Author: peter_v (Peter Vandenabeele)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.1.0dev and 2.0.0
Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN


Hi,

I want to generate a CSV string, from millions of rows.
I see the time to create the string grows quadratic
with the amount of rows. With this issue, I cannot use
ruby 2.0.0 to create the CSV file.

I did not see this problem was not present in ruby 1.9.3.

I see the problem is present in ruby 2.0.0 and ruby-head.

Using ruby-head
===============

Installed  with `rvm reinstall ruby-head`  (built from version 3a01b9e) 

peter_v@peter64:~/p/dbd$ rvm use ruby-head
Using /home/peter_v/.rvm/gems/ruby-head

peter_v@peter64:~/p/dbd$ ruby -v
ruby 2.1.0dev (2013-06-30) [x86_64-linux]

peter_v@peter64:~/p/dbd$ uname -a
Linux peter64 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

peter_v@peter64:~/p/dbd$ rvm current
ruby-head

peter_v@peter64:~/p/dbd$ cat bin/test_4.rb 
#!/usr/bin/env ruby

count = ARGV[0].to_i
unless count > 0
  puts "Give a 'count' as first argument."
  exit(1)
end

require 'csv'

row_data = [
  "59ffbb3b-1e48-4c1f-81d8-d93afc84c966",
  "2013-06-28 19:14:55.975000806 UTC",
  "a11f290e-c441-41bc-8b8c-4e6c27b1b6fc",
  "c73e6241-d46f-4952-8377-c11372346d15",
  "test",
  "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0"]

puts "starting CSV.generate"
start_time = Time.now

csv_string = CSV.generate(force_quotes: true) do |csv|
  count.times do
    csv << row_data
  end
end

puts "CSV.generate took #{Time.now - start_time} seconds"

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 1.01238478 seconds

real	0m1.045s
user	0m1.044s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 3.815373614 seconds

real	0m3.847s
user	0m3.844s
sys	0m0.000s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 17.176208859 seconds

real	0m17.212s
user	0m17.177s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 71.400916725 seconds

real	1m11.436s
user	1m11.320s
sys	0m0.036s
peter_v@peter64:~/p/dbd$ 


Using ruby-1.9.3-p448
=====================

This is as expected LINEAR growth of time with number of rows.

peter_v@peter64:~/p/dbd$ rvm use ruby-1.9.3
Using /home/peter_v/.rvm/gems/ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ ruby -v
ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-linux]

peter_v@peter64:~/p/dbd$ rvm current
ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 0.125396387 seconds

real	0m0.150s
user	0m0.140s
sys	0m0.008s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 0.249746069 seconds

real	0m0.274s
user	0m0.268s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 0.498180989 seconds

real	0m0.522s
user	0m0.504s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 0.991481147 seconds

real	0m1.015s
user	0m1.000s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 100_000
starting CSV.generate
CSV.generate took 1.243347153 seconds

real	0m1.265s
user	0m1.240s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 1_000_000
starting CSV.generate
CSV.generate took 12.461711974 seconds

real	0m12.492s
user	0m12.405s
sys	0m0.080s
peter_v@peter64:~/p/dbd$ 



-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:55717] [ruby-trunk - Bug #8585] Time for CSV.generate grows quadratic with number of rows
  2013-06-30 10:04 [ruby-core:55709] [ruby-trunk - Bug #8585][Open] Time for CSV.generate grows quadratic with number of rows peter_v (Peter Vandenabeele)
                   ` (3 preceding siblings ...)
  2013-06-30 14:15 ` [ruby-core:55716] " charliesome (Charlie Somerville)
@ 2013-06-30 14:38 ` nobu (Nobuyoshi Nakada)
  2013-06-30 16:56 ` [ruby-core:55718] " Eregon (Benoit Daloze)
  5 siblings, 0 replies; 7+ messages in thread
From: nobu (Nobuyoshi Nakada) @ 2013-06-30 14:38 UTC (permalink / raw
  To: ruby-core


Issue #8585 has been updated by nobu (Nobuyoshi Nakada).

File bug-8585.diff added

Eregon (Benoit Daloze) wrote:
> Adding "# encoding: US-ASCII" at the top of the script makes it identical to the previous behavior, therefore taking the same time. I would certainly not call this a solution though.

The file already has that line.

This slug seems because `String#encode` in `do_quote` lambda in init_separators is called for each fields.

----------------------------------------
Bug #8585: Time for CSV.generate grows quadratic with number of rows
https://bugs.ruby-lang.org/issues/8585#change-40211

Author: peter_v (Peter Vandenabeele)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.1.0dev and 2.0.0
Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN


Hi,

I want to generate a CSV string, from millions of rows.
I see the time to create the string grows quadratic
with the amount of rows. With this issue, I cannot use
ruby 2.0.0 to create the CSV file.

I did not see this problem was not present in ruby 1.9.3.

I see the problem is present in ruby 2.0.0 and ruby-head.

Using ruby-head
===============

Installed  with `rvm reinstall ruby-head`  (built from version 3a01b9e) 

peter_v@peter64:~/p/dbd$ rvm use ruby-head
Using /home/peter_v/.rvm/gems/ruby-head

peter_v@peter64:~/p/dbd$ ruby -v
ruby 2.1.0dev (2013-06-30) [x86_64-linux]

peter_v@peter64:~/p/dbd$ uname -a
Linux peter64 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

peter_v@peter64:~/p/dbd$ rvm current
ruby-head

peter_v@peter64:~/p/dbd$ cat bin/test_4.rb 
#!/usr/bin/env ruby

count = ARGV[0].to_i
unless count > 0
  puts "Give a 'count' as first argument."
  exit(1)
end

require 'csv'

row_data = [
  "59ffbb3b-1e48-4c1f-81d8-d93afc84c966",
  "2013-06-28 19:14:55.975000806 UTC",
  "a11f290e-c441-41bc-8b8c-4e6c27b1b6fc",
  "c73e6241-d46f-4952-8377-c11372346d15",
  "test",
  "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0"]

puts "starting CSV.generate"
start_time = Time.now

csv_string = CSV.generate(force_quotes: true) do |csv|
  count.times do
    csv << row_data
  end
end

puts "CSV.generate took #{Time.now - start_time} seconds"

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 1.01238478 seconds

real	0m1.045s
user	0m1.044s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 3.815373614 seconds

real	0m3.847s
user	0m3.844s
sys	0m0.000s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 17.176208859 seconds

real	0m17.212s
user	0m17.177s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 71.400916725 seconds

real	1m11.436s
user	1m11.320s
sys	0m0.036s
peter_v@peter64:~/p/dbd$ 


Using ruby-1.9.3-p448
=====================

This is as expected LINEAR growth of time with number of rows.

peter_v@peter64:~/p/dbd$ rvm use ruby-1.9.3
Using /home/peter_v/.rvm/gems/ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ ruby -v
ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-linux]

peter_v@peter64:~/p/dbd$ rvm current
ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 0.125396387 seconds

real	0m0.150s
user	0m0.140s
sys	0m0.008s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 0.249746069 seconds

real	0m0.274s
user	0m0.268s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 0.498180989 seconds

real	0m0.522s
user	0m0.504s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 0.991481147 seconds

real	0m1.015s
user	0m1.000s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 100_000
starting CSV.generate
CSV.generate took 1.243347153 seconds

real	0m1.265s
user	0m1.240s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 1_000_000
starting CSV.generate
CSV.generate took 12.461711974 seconds

real	0m12.492s
user	0m12.405s
sys	0m0.080s
peter_v@peter64:~/p/dbd$ 



-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:55718] [ruby-trunk - Bug #8585] Time for CSV.generate grows quadratic with number of rows
  2013-06-30 10:04 [ruby-core:55709] [ruby-trunk - Bug #8585][Open] Time for CSV.generate grows quadratic with number of rows peter_v (Peter Vandenabeele)
                   ` (4 preceding siblings ...)
  2013-06-30 14:38 ` [ruby-core:55717] " nobu (Nobuyoshi Nakada)
@ 2013-06-30 16:56 ` Eregon (Benoit Daloze)
  5 siblings, 0 replies; 7+ messages in thread
From: Eregon (Benoit Daloze) @ 2013-06-30 16:56 UTC (permalink / raw
  To: ruby-core


Issue #8585 has been updated by Eregon (Benoit Daloze).


nobu (Nobuyoshi Nakada) wrote:
> The file already has that line.

I meant at the top of the test script provided in the description.

> This slug seems because `String#encode` in `do_quote` lambda in init_separators is called for each fields.

Any idea why this makes the whole process quadratic?
----------------------------------------
Bug #8585: Time for CSV.generate grows quadratic with number of rows
https://bugs.ruby-lang.org/issues/8585#change-40212

Author: peter_v (Peter Vandenabeele)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.1.0dev and 2.0.0
Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN


Hi,

I want to generate a CSV string, from millions of rows.
I see the time to create the string grows quadratic
with the amount of rows. With this issue, I cannot use
ruby 2.0.0 to create the CSV file.

I did not see this problem was not present in ruby 1.9.3.

I see the problem is present in ruby 2.0.0 and ruby-head.

Using ruby-head
===============

Installed  with `rvm reinstall ruby-head`  (built from version 3a01b9e) 

peter_v@peter64:~/p/dbd$ rvm use ruby-head
Using /home/peter_v/.rvm/gems/ruby-head

peter_v@peter64:~/p/dbd$ ruby -v
ruby 2.1.0dev (2013-06-30) [x86_64-linux]

peter_v@peter64:~/p/dbd$ uname -a
Linux peter64 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

peter_v@peter64:~/p/dbd$ rvm current
ruby-head

peter_v@peter64:~/p/dbd$ cat bin/test_4.rb 
#!/usr/bin/env ruby

count = ARGV[0].to_i
unless count > 0
  puts "Give a 'count' as first argument."
  exit(1)
end

require 'csv'

row_data = [
  "59ffbb3b-1e48-4c1f-81d8-d93afc84c966",
  "2013-06-28 19:14:55.975000806 UTC",
  "a11f290e-c441-41bc-8b8c-4e6c27b1b6fc",
  "c73e6241-d46f-4952-8377-c11372346d15",
  "test",
  "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0"]

puts "starting CSV.generate"
start_time = Time.now

csv_string = CSV.generate(force_quotes: true) do |csv|
  count.times do
    csv << row_data
  end
end

puts "CSV.generate took #{Time.now - start_time} seconds"

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 1.01238478 seconds

real	0m1.045s
user	0m1.044s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 3.815373614 seconds

real	0m3.847s
user	0m3.844s
sys	0m0.000s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 17.176208859 seconds

real	0m17.212s
user	0m17.177s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 71.400916725 seconds

real	1m11.436s
user	1m11.320s
sys	0m0.036s
peter_v@peter64:~/p/dbd$ 


Using ruby-1.9.3-p448
=====================

This is as expected LINEAR growth of time with number of rows.

peter_v@peter64:~/p/dbd$ rvm use ruby-1.9.3
Using /home/peter_v/.rvm/gems/ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ ruby -v
ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-linux]

peter_v@peter64:~/p/dbd$ rvm current
ruby-1.9.3-p448

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 10_000
starting CSV.generate
CSV.generate took 0.125396387 seconds

real	0m0.150s
user	0m0.140s
sys	0m0.008s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 20_000
starting CSV.generate
CSV.generate took 0.249746069 seconds

real	0m0.274s
user	0m0.268s
sys	0m0.004s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 40_000
starting CSV.generate
CSV.generate took 0.498180989 seconds

real	0m0.522s
user	0m0.504s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 80_000
starting CSV.generate
CSV.generate took 0.991481147 seconds

real	0m1.015s
user	0m1.000s
sys	0m0.016s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 100_000
starting CSV.generate
CSV.generate took 1.243347153 seconds

real	0m1.265s
user	0m1.240s
sys	0m0.020s

peter_v@peter64:~/p/dbd$ time bin/test_4.rb 1_000_000
starting CSV.generate
CSV.generate took 12.461711974 seconds

real	0m12.492s
user	0m12.405s
sys	0m0.080s
peter_v@peter64:~/p/dbd$ 



-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-06-30 17:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-30 10:04 [ruby-core:55709] [ruby-trunk - Bug #8585][Open] Time for CSV.generate grows quadratic with number of rows peter_v (Peter Vandenabeele)
2013-06-30 10:38 ` [ruby-core:55710] [ruby-trunk - Bug #8585] " peter_v (Peter Vandenabeele)
2013-06-30 14:00 ` [ruby-core:55714] " Eregon (Benoit Daloze)
2013-06-30 14:04 ` [ruby-core:55715] " Eregon (Benoit Daloze)
2013-06-30 14:15 ` [ruby-core:55716] " charliesome (Charlie Somerville)
2013-06-30 14:38 ` [ruby-core:55717] " nobu (Nobuyoshi Nakada)
2013-06-30 16:56 ` [ruby-core:55718] " Eregon (Benoit Daloze)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).