ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:94773] [Ruby master Bug#16143] BOM UTF-8 is not removed after rewind
       [not found] <redmine.issue-16143.20190904082852@ruby-lang.org>
@ 2019-09-04  8:28 ` dirk.meier.eickhoff+ruby-lang
  2019-09-05 13:57 ` [ruby-core:94785] " nobu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 4+ messages in thread
From: dirk.meier.eickhoff+ruby-lang @ 2019-09-04  8:28 UTC (permalink / raw
  To: ruby-core

Issue #16143 has been reported by Dirk (Dirk Meier-Eickhoff).

----------------------------------------
Bug #16143: BOM UTF-8 is not removed after rewind
https://bugs.ruby-lang.org/issues/16143

* Author: Dirk (Dirk Meier-Eickhoff)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-darwin17]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have a CSV file with "forced quotes" and UTF-8 BOM (\xEF\xBB\xBF) which CSV can not read after a `rewind`. I get "CSV::MalformedCSVError: Illegal quoting in line 1."

My UTF-8 CSV file with BOM:
``` ruby
File.open('bom_test.csv', 'w') do |io|
  io.write("\xEF\xBB\xBF\"Name\",\"City\"\n\"John Doe\",\"New York\"")
end
```

Reproduce error:


``` ruby
# Case 1
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.shift
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.shift
# => CSV::MalformedCSVError (Illegal quoting in line 1.)

# Case 2
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.readline
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.readline
# => CSV::MalformedCSVError (Illegal quoting in line 1.)
```

Sutou Kouhei has posted other reproducable code to my first issue at CSV gem: https://github.com/ruby/csv/issues/103
``` ruby
File.open("/tmp/a.txt", "w") do |x|
  x.puts("\xEF\xBB\xBFa,b,c")
end
File.open("/tmp/a.txt", "r:BOM|UTF-8") do |x|
  p x.gets.unpack("U*") # => [97, 44, 98, 44, 99, 10]
  x.rewind
  p x.gets.unpack("U*") # => [65279, 97, 44, 98, 44, 99, 10]
end
```

He said: "This [CSV] library rely on Ruby's BOM processing. It seems that Ruby's BOM processing doesn't support rewind."

My expectation is that reading a file with BOM always return the same content, regardless of first reading or after a rewind.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:94785] [Ruby master Bug#16143] BOM UTF-8 is not removed after rewind
       [not found] <redmine.issue-16143.20190904082852@ruby-lang.org>
  2019-09-04  8:28 ` [ruby-core:94773] [Ruby master Bug#16143] BOM UTF-8 is not removed after rewind dirk.meier.eickhoff+ruby-lang
@ 2019-09-05 13:57 ` nobu
  2019-10-10 21:59 ` [ruby-core:95298] " kou
  2019-10-17  6:30 ` [ruby-core:95385] " akr
  3 siblings, 0 replies; 4+ messages in thread
From: nobu @ 2019-09-05 13:57 UTC (permalink / raw
  To: ruby-core

Issue #16143 has been updated by nobu (Nobuyoshi Nakada).


I'm afraid if the spec of BOM is such simple and obvious.

Implemented but I'm sure that something is overlooked.
https://github.com/ruby/ruby/pull/2430

----------------------------------------
Bug #16143: BOM UTF-8 is not removed after rewind
https://bugs.ruby-lang.org/issues/16143#change-81403

* Author: Dirk (Dirk Meier-Eickhoff)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-darwin17]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have a CSV file with "forced quotes" and UTF-8 BOM (\xEF\xBB\xBF) which CSV can not read after a `rewind`. I get "CSV::MalformedCSVError: Illegal quoting in line 1."

My UTF-8 CSV file with BOM:
``` ruby
File.open('bom_test.csv', 'w') do |io|
  io.write("\xEF\xBB\xBF\"Name\",\"City\"\n\"John Doe\",\"New York\"")
end
```

Reproduce error:


``` ruby
# Case 1
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.shift
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.shift
# => CSV::MalformedCSVError (Illegal quoting in line 1.)

# Case 2
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.readline
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.readline
# => CSV::MalformedCSVError (Illegal quoting in line 1.)
```

Sutou Kouhei has posted other reproducable code to my first issue at CSV gem: https://github.com/ruby/csv/issues/103
``` ruby
File.open("/tmp/a.txt", "w") do |x|
  x.puts("\xEF\xBB\xBFa,b,c")
end
File.open("/tmp/a.txt", "r:BOM|UTF-8") do |x|
  p x.gets.unpack("U*") # => [97, 44, 98, 44, 99, 10]
  x.rewind
  p x.gets.unpack("U*") # => [65279, 97, 44, 98, 44, 99, 10]
end
```

He said: "This [CSV] library rely on Ruby's BOM processing. It seems that Ruby's BOM processing doesn't support rewind."

My expectation is that reading a file with BOM always return the same content, regardless of first reading or after a rewind.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:95298] [Ruby master Bug#16143] BOM UTF-8 is not removed after rewind
       [not found] <redmine.issue-16143.20190904082852@ruby-lang.org>
  2019-09-04  8:28 ` [ruby-core:94773] [Ruby master Bug#16143] BOM UTF-8 is not removed after rewind dirk.meier.eickhoff+ruby-lang
  2019-09-05 13:57 ` [ruby-core:94785] " nobu
@ 2019-10-10 21:59 ` kou
  2019-10-17  6:30 ` [ruby-core:95385] " akr
  3 siblings, 0 replies; 4+ messages in thread
From: kou @ 2019-10-10 21:59 UTC (permalink / raw
  To: ruby-core

Issue #16143 has been updated by kou (Kouhei Sutou).


I've reviewed the pull request. I found a problem.

----------------------------------------
Bug #16143: BOM UTF-8 is not removed after rewind
https://bugs.ruby-lang.org/issues/16143#change-81982

* Author: Dirk (Dirk Meier-Eickhoff)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-darwin17]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have a CSV file with "forced quotes" and UTF-8 BOM (\xEF\xBB\xBF) which CSV can not read after a `rewind`. I get "CSV::MalformedCSVError: Illegal quoting in line 1."

My UTF-8 CSV file with BOM:
``` ruby
File.open('bom_test.csv', 'w') do |io|
  io.write("\xEF\xBB\xBF\"Name\",\"City\"\n\"John Doe\",\"New York\"")
end
```

Reproduce error:


``` ruby
# Case 1
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.shift
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.shift
# => CSV::MalformedCSVError (Illegal quoting in line 1.)

# Case 2
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.readline
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.readline
# => CSV::MalformedCSVError (Illegal quoting in line 1.)
```

Sutou Kouhei has posted other reproducable code to my first issue at CSV gem: https://github.com/ruby/csv/issues/103
``` ruby
File.open("/tmp/a.txt", "w") do |x|
  x.puts("\xEF\xBB\xBFa,b,c")
end
File.open("/tmp/a.txt", "r:BOM|UTF-8") do |x|
  p x.gets.unpack("U*") # => [97, 44, 98, 44, 99, 10]
  x.rewind
  p x.gets.unpack("U*") # => [65279, 97, 44, 98, 44, 99, 10]
end
```

He said: "This [CSV] library rely on Ruby's BOM processing. It seems that Ruby's BOM processing doesn't support rewind."

My expectation is that reading a file with BOM always return the same content, regardless of first reading or after a rewind.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:95385] [Ruby master Bug#16143] BOM UTF-8 is not removed after rewind
       [not found] <redmine.issue-16143.20190904082852@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2019-10-10 21:59 ` [ruby-core:95298] " kou
@ 2019-10-17  6:30 ` akr
  3 siblings, 0 replies; 4+ messages in thread
From: akr @ 2019-10-17  6:30 UTC (permalink / raw
  To: ruby-core

Issue #16143 has been updated by akr (Akira Tanaka).


I feel changing the default behavior of IO#rewind is dangerous.

We use IO#rewind when we modify a file in place.

```
open(filename, "r+") {|f|
  f.read
  f.rewind
  f.truncate(0)
  f.write "..."
}
```

If IO#rewind moves the file pointer to just after BOM,
BOM is changed to NULs in above code.

I think adding a keyword argument for IO#rewind is better for compatibility.

----------------------------------------
Bug #16143: BOM UTF-8 is not removed after rewind
https://bugs.ruby-lang.org/issues/16143#change-82096

* Author: Dirk (Dirk Meier-Eickhoff)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-darwin17]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have a CSV file with "forced quotes" and UTF-8 BOM (\xEF\xBB\xBF) which CSV can not read after a `rewind`. I get "CSV::MalformedCSVError: Illegal quoting in line 1."

My UTF-8 CSV file with BOM:
``` ruby
File.open('bom_test.csv', 'w') do |io|
  io.write("\xEF\xBB\xBF\"Name\",\"City\"\n\"John Doe\",\"New York\"")
end
```

Reproduce error:


``` ruby
# Case 1
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.shift
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.shift
# => CSV::MalformedCSVError (Illegal quoting in line 1.)

# Case 2
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.readline
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.readline
# => CSV::MalformedCSVError (Illegal quoting in line 1.)
```

Sutou Kouhei has posted other reproducable code to my first issue at CSV gem: https://github.com/ruby/csv/issues/103
``` ruby
File.open("/tmp/a.txt", "w") do |x|
  x.puts("\xEF\xBB\xBFa,b,c")
end
File.open("/tmp/a.txt", "r:BOM|UTF-8") do |x|
  p x.gets.unpack("U*") # => [97, 44, 98, 44, 99, 10]
  x.rewind
  p x.gets.unpack("U*") # => [65279, 97, 44, 98, 44, 99, 10]
end
```

He said: "This [CSV] library rely on Ruby's BOM processing. It seems that Ruby's BOM processing doesn't support rewind."

My expectation is that reading a file with BOM always return the same content, regardless of first reading or after a rewind.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-17  6:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-16143.20190904082852@ruby-lang.org>
2019-09-04  8:28 ` [ruby-core:94773] [Ruby master Bug#16143] BOM UTF-8 is not removed after rewind dirk.meier.eickhoff+ruby-lang
2019-09-05 13:57 ` [ruby-core:94785] " nobu
2019-10-10 21:59 ` [ruby-core:95298] " kou
2019-10-17  6:30 ` [ruby-core:95385] " akr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).