ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:71697] [Ruby trunk - Bug #11744] [Open] Open files being GC'ed while still in use
       [not found] <redmine.issue-11744.20151126181251@ruby-lang.org>
@ 2015-11-26 18:12 ` godfreykfc
  2015-11-26 18:25 ` [ruby-core:71698] [Ruby trunk - Bug #11744] " godfreykfc
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: godfreykfc @ 2015-11-26 18:12 UTC (permalink / raw)
  To: ruby-core

Issue #11744 has been reported by Godfrey Chan.

----------------------------------------
Bug #11744: Open files being GC'ed while still in use
https://bugs.ruby-lang.org/issues/11744

* Author: Godfrey Chan
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: 2.2.3
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
Hello!

We recently added a feature on Rails that uses the [`listen` gem](https://github.com/guard/listen) under-the-hood. Since we enabled those tests, we have been seeing random failures on Travis CI like [these](https://travis-ci.org/rails/rails/jobs/92563035#L976):

```
  1) Error:
KeyGeneratorTest#test_Generating_a_key_of_an_alternative_length:
Errno::EBADF: Bad file descriptor @ fptr_finalize - /home/travis/build/rails/rails/activesupport/lib/active_support/key_generator.rb
    /home/travis/build/rails/rails/activesupport/test/key_generator_test.rb:16:in `setup'
```

What's happening here is that the [setup code](https://github.com/rails/rails/blob/master/activesupport/test/key_generator_test.rb#L16) for this test is referencing an [constant previously marked for autoload](https://github.com/rails/rails/blob/master/activesupport/lib/active_support.rb#L52), and something went wrong *during* autoload (inside Ruby).

(By the way, I was referring to [Ruby's `autoload` feature](http://ruby-doc.org/core-2.2.3/Module.html#method-i-autoload) which is not specific to Rails, although we did [enhance it to infer the filename](https://github.com/rails/rails/blob/e1e6499ede1dd196c03f650b95c3a0098c7c32ff/activesupport/lib/active_support/dependencies/autoload.rb).)

Because there are so many things going on in the test suite, I tried to narrow down the problem further by removing most of the work we do.

Eventually, I arrived at this [reproduction script](https://github.com/rails/rails/blob/9982d60b34b976a826358644d1157d2589b92e94/activesupport/test/file_evented_update_checker_test.rb#L4-L39).

I disabled the rest of the test suite so that this file is the only test being ran, in three different configurations:

```
(1) LISTEN=1 LISTEN_GC_FIX=1 bundle exec rake test
(2) LISTEN=1 bundle exec rake test
(3) LISTEN=0 bundle exec rake test
```

In each of the tests, we tried to do a very simple `File.open(..., 'r') { |f| ... }` (reading in the first 10 bytes to make sure we are dealing with the correct file) up to 5000 times.

In (1), we set up 20 listeners, do no work, and properly tear down each of them, and disabling the GC before opening the files.

In (2), we also setup the listeners but do not disable the GC.

In (3), we do not set up the listeners nor disable the GC.

Scenario (2) reliably fails on Travis CI, and the rest reliably passes: https://travis-ci.org/rails/rails/builds/93402196

```
/home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `close': Bad file descriptor @ fptr_finalize - vendor/bundle/ruby/2.2.0/cache/bundler/git/mail-c03100656df0e86c13a2d6103ac408da7a4e728c/objects/34/fdd44ee4de34a2f05b0088562e38797db604a8 (Errno::EBADF)
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `open'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `block (2 levels) in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `glob'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `block in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `chdir'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `<top (required)>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `require'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `each'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `block in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `select'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `<main>'
```

Note that we are using the "block form" `File.open` in the test here, and [the error](https://travis-ci.org/rails/rails/jobs/93402198#L4418) is happening during the implicit `close` call inside Ruby. Also note that we are *loading* the listen gem in all three cases, so merely having it loaded does not trigger the issue.

I was not able to reproduce this locally on a Mac, so this is probably Linux specific. From what I can tell, listen gem uses [rb-inotify](https://github.com/nex3/rb-inotify) (which in turns uses ffi) on Linux so that might be related too.

* * *

My hypothesis is that, so interaction between listen/rb-inotify/ffi during the listener setup/teardown phase is putting the Ruby VM/process into a bad state. In this state, somehow the VM is confused about what is being used. When the GC runs, it somehow thinks that the file we are currently reading (the `f` inside the `File.open(...) { |f| ... }` call) is no longer being used, causing it to be GC-ed and auto-closed. When the block finishes running (it did indeed open the file correctly, as it was able to read the first 10 bytes correctly), the [finalizer](https://github.com/ruby/ruby/blob/v2_2_3/io.c#L6387) code tries to close FD again, causing the OS to return a `EBADF` error.

While it is certainly possible that the listen/rb-inotify/ffi gems are doing something wrong, it seems quite strange that they could put the Ruby process in such a state, so I figured it is worth having Ruby core people look into this.

I tried to [reproduce this outside of Rails](https://github.com/chancancode/listen-bug) but so far I cannot make it work, I'll keep trying. (Feel free to fork it.)



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:71698] [Ruby trunk - Bug #11744] Open files being GC'ed while still in use
       [not found] <redmine.issue-11744.20151126181251@ruby-lang.org>
  2015-11-26 18:12 ` [ruby-core:71697] [Ruby trunk - Bug #11744] [Open] Open files being GC'ed while still in use godfreykfc
@ 2015-11-26 18:25 ` godfreykfc
  2015-11-27  5:46 ` [ruby-core:71701] " naruse
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: godfreykfc @ 2015-11-26 18:25 UTC (permalink / raw)
  To: ruby-core

Issue #11744 has been updated by Godfrey Chan.


We have found the issue – apparently, listen 3.0.4 and below [doesn't actually teardown the inotify watchers](https://github.com/guard/listen/issues/353). Upgrading to 3.0.5 (which implemented the teardown) fixes the issue for us.

However, it still seems strange that merely leaving ~20 inotify FDs around could trigger this GC issue on Ruby side (and that still seems incorrect regardless), so it's probably still worth investigating.

----------------------------------------
Bug #11744: Open files being GC'ed while still in use
https://bugs.ruby-lang.org/issues/11744#change-55105

* Author: Godfrey Chan
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: 2.2.3
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
Hello!

We recently added a feature on Rails that uses the [`listen` gem](https://github.com/guard/listen) under-the-hood. Since we enabled those tests, we have been seeing random failures on Travis CI like [these](https://travis-ci.org/rails/rails/jobs/92563035#L976):

```
  1) Error:
KeyGeneratorTest#test_Generating_a_key_of_an_alternative_length:
Errno::EBADF: Bad file descriptor @ fptr_finalize - /home/travis/build/rails/rails/activesupport/lib/active_support/key_generator.rb
    /home/travis/build/rails/rails/activesupport/test/key_generator_test.rb:16:in `setup'
```

What's happening here is that the [setup code](https://github.com/rails/rails/blob/master/activesupport/test/key_generator_test.rb#L16) for this test is referencing an [constant previously marked for autoload](https://github.com/rails/rails/blob/master/activesupport/lib/active_support.rb#L52), and something went wrong *during* autoload (inside Ruby).

(By the way, I was referring to [Ruby's `autoload` feature](http://ruby-doc.org/core-2.2.3/Module.html#method-i-autoload) which is not specific to Rails, although we did [enhance it to infer the filename](https://github.com/rails/rails/blob/e1e6499ede1dd196c03f650b95c3a0098c7c32ff/activesupport/lib/active_support/dependencies/autoload.rb).)

Because there are so many things going on in the test suite, I tried to narrow down the problem further by removing most of the work we do.

Eventually, I arrived at this [reproduction script](https://github.com/rails/rails/blob/9982d60b34b976a826358644d1157d2589b92e94/activesupport/test/file_evented_update_checker_test.rb#L4-L39).

I disabled the rest of the test suite so that this file is the only test being ran, in three different configurations:

```
(1) LISTEN=1 LISTEN_GC_FIX=1 bundle exec rake test
(2) LISTEN=1 bundle exec rake test
(3) LISTEN=0 bundle exec rake test
```

In each of the tests, we tried to do a very simple `File.open(..., 'r') { |f| ... }` (reading in the first 10 bytes to make sure we are dealing with the correct file) up to 5000 times.

In (1), we set up 20 listeners, do no work, and properly tear down each of them, and disabling the GC before opening the files.

In (2), we also setup the listeners but do not disable the GC.

In (3), we do not set up the listeners nor disable the GC.

Scenario (2) reliably fails on Travis CI, and the rest reliably passes: https://travis-ci.org/rails/rails/builds/93402196

```
/home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `close': Bad file descriptor @ fptr_finalize - vendor/bundle/ruby/2.2.0/cache/bundler/git/mail-c03100656df0e86c13a2d6103ac408da7a4e728c/objects/34/fdd44ee4de34a2f05b0088562e38797db604a8 (Errno::EBADF)
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `open'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `block (2 levels) in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `glob'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `block in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `chdir'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `<top (required)>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `require'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `each'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `block in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `select'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `<main>'
```

Note that we are using the "block form" `File.open` in the test here, and [the error](https://travis-ci.org/rails/rails/jobs/93402198#L4418) is happening during the implicit `close` call inside Ruby. Also note that we are *loading* the listen gem in all three cases, so merely having it loaded does not trigger the issue.

I was not able to reproduce this locally on a Mac, so this is probably Linux specific. From what I can tell, listen gem uses [rb-inotify](https://github.com/nex3/rb-inotify) (which in turns uses ffi) on Linux so that might be related too.

* * *

My hypothesis is that, so interaction between listen/rb-inotify/ffi during the listener setup/teardown phase is putting the Ruby VM/process into a bad state. In this state, somehow the VM is confused about what is being used. When the GC runs, it somehow thinks that the file we are currently reading (the `f` inside the `File.open(...) { |f| ... }` call) is no longer being used, causing it to be GC-ed and auto-closed. When the block finishes running (it did indeed open the file correctly, as it was able to read the first 10 bytes correctly), the [finalizer](https://github.com/ruby/ruby/blob/v2_2_3/io.c#L6387) code tries to close FD again, causing the OS to return a `EBADF` error.

While it is certainly possible that the listen/rb-inotify/ffi gems are doing something wrong, it seems quite strange that they could put the Ruby process in such a state, so I figured it is worth having Ruby core people look into this.

I tried to [reproduce this outside of Rails](https://github.com/chancancode/listen-bug) but so far I cannot make it work, I'll keep trying. (Feel free to fork it.)



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:71701] [Ruby trunk - Bug #11744] Open files being GC'ed while still in use
       [not found] <redmine.issue-11744.20151126181251@ruby-lang.org>
  2015-11-26 18:12 ` [ruby-core:71697] [Ruby trunk - Bug #11744] [Open] Open files being GC'ed while still in use godfreykfc
  2015-11-26 18:25 ` [ruby-core:71698] [Ruby trunk - Bug #11744] " godfreykfc
@ 2015-11-27  5:46 ` naruse
  2015-11-27  9:22 ` [ruby-core:71703] " godfreykfc
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: naruse @ 2015-11-27  5:46 UTC (permalink / raw)
  To: ruby-core

Issue #11744 has been updated by Yui NARUSE.


In my experience, such false closing is caused by IO.for_fd without specifying autoclose: false.

You know Ruby's IO object automatically closes their fd on finalize.
It happens even when the object is created by IO.for_fd(fd) (or IO.new(fd) or other aliases),
of course the original user of the fd hit EBADF.
You can avoid this by specifying autoclose: false for IO.for_fd.

----------------------------------------
Bug #11744: Open files being GC'ed while still in use
https://bugs.ruby-lang.org/issues/11744#change-55107

* Author: Godfrey Chan
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: 2.2.3
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
Hello!

We recently added a feature on Rails that uses the [`listen` gem](https://github.com/guard/listen) under-the-hood. Since we enabled those tests, we have been seeing random failures on Travis CI like [these](https://travis-ci.org/rails/rails/jobs/92563035#L976):

```
  1) Error:
KeyGeneratorTest#test_Generating_a_key_of_an_alternative_length:
Errno::EBADF: Bad file descriptor @ fptr_finalize - /home/travis/build/rails/rails/activesupport/lib/active_support/key_generator.rb
    /home/travis/build/rails/rails/activesupport/test/key_generator_test.rb:16:in `setup'
```

What's happening here is that the [setup code](https://github.com/rails/rails/blob/master/activesupport/test/key_generator_test.rb#L16) for this test is referencing an [constant previously marked for autoload](https://github.com/rails/rails/blob/master/activesupport/lib/active_support.rb#L52), and something went wrong *during* autoload (inside Ruby).

(By the way, I was referring to [Ruby's `autoload` feature](http://ruby-doc.org/core-2.2.3/Module.html#method-i-autoload) which is not specific to Rails, although we did [enhance it to infer the filename](https://github.com/rails/rails/blob/e1e6499ede1dd196c03f650b95c3a0098c7c32ff/activesupport/lib/active_support/dependencies/autoload.rb).)

Because there are so many things going on in the test suite, I tried to narrow down the problem further by removing most of the work we do.

Eventually, I arrived at this [reproduction script](https://github.com/rails/rails/blob/9982d60b34b976a826358644d1157d2589b92e94/activesupport/test/file_evented_update_checker_test.rb#L4-L39).

I disabled the rest of the test suite so that this file is the only test being ran, in three different configurations:

```
(1) LISTEN=1 LISTEN_GC_FIX=1 bundle exec rake test
(2) LISTEN=1 bundle exec rake test
(3) LISTEN=0 bundle exec rake test
```

In each of the tests, we tried to do a very simple `File.open(..., 'r') { |f| ... }` (reading in the first 10 bytes to make sure we are dealing with the correct file) up to 5000 times.

In (1), we set up 20 listeners, do no work, and properly tear down each of them, and disabling the GC before opening the files.

In (2), we also setup the listeners but do not disable the GC.

In (3), we do not set up the listeners nor disable the GC.

Scenario (2) reliably fails on Travis CI, and the rest reliably passes: https://travis-ci.org/rails/rails/builds/93402196

```
/home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `close': Bad file descriptor @ fptr_finalize - vendor/bundle/ruby/2.2.0/cache/bundler/git/mail-c03100656df0e86c13a2d6103ac408da7a4e728c/objects/34/fdd44ee4de34a2f05b0088562e38797db604a8 (Errno::EBADF)
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `open'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `block (2 levels) in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `glob'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `block in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `chdir'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `<top (required)>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `require'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `each'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `block in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `select'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `<main>'
```

Note that we are using the "block form" `File.open` in the test here, and [the error](https://travis-ci.org/rails/rails/jobs/93402198#L4418) is happening during the implicit `close` call inside Ruby. Also note that we are *loading* the listen gem in all three cases, so merely having it loaded does not trigger the issue.

I was not able to reproduce this locally on a Mac, so this is probably Linux specific. From what I can tell, listen gem uses [rb-inotify](https://github.com/nex3/rb-inotify) (which in turns uses ffi) on Linux so that might be related too.

* * *

My hypothesis is that, so interaction between listen/rb-inotify/ffi during the listener setup/teardown phase is putting the Ruby VM/process into a bad state. In this state, somehow the VM is confused about what is being used. When the GC runs, it somehow thinks that the file we are currently reading (the `f` inside the `File.open(...) { |f| ... }` call) is no longer being used, causing it to be GC-ed and auto-closed. When the block finishes running (it did indeed open the file correctly, as it was able to read the first 10 bytes correctly), the [finalizer](https://github.com/ruby/ruby/blob/v2_2_3/io.c#L6387) code tries to close FD again, causing the OS to return a `EBADF` error.

While it is certainly possible that the listen/rb-inotify/ffi gems are doing something wrong, it seems quite strange that they could put the Ruby process in such a state, so I figured it is worth having Ruby core people look into this.

I tried to [reproduce this outside of Rails](https://github.com/chancancode/listen-bug) but so far I cannot make it work, I'll keep trying. (Feel free to fork it.)



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:71703] [Ruby trunk - Bug #11744] Open files being GC'ed while still in use
       [not found] <redmine.issue-11744.20151126181251@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2015-11-27  5:46 ` [ruby-core:71701] " naruse
@ 2015-11-27  9:22 ` godfreykfc
  2015-11-27 16:35 ` [ruby-core:71707] " godfreykfc
  2015-11-27 18:16 ` [ruby-core:71708] [Ruby trunk - Bug #11744] [Closed] " naruse
  5 siblings, 0 replies; 6+ messages in thread
From: godfreykfc @ 2015-11-27  9:22 UTC (permalink / raw)
  To: ruby-core

Issue #11744 has been updated by Godfrey Chan.


> We have found the issue

Looks like I spoke too soon! The issue is back with listen 3.0.5.

> In my experience, such false closing is caused by IO.for_fd without specifying autoclose: false.

I don't *think* that's the case here, but I could be completely wrong.

In the [simplified reproduction script](https://github.com/rails/rails/blob/9982d60b34b976a826358644d1157d2589b92e94/activesupport/test/file_evented_update_checker_test.rb#L4-L39), all we are doing is this:

```
    Dir.glob("**/*") do |entry|
      if File.file?(entry) && !File.zero?(entry)
        File.open(entry, 'r') { |f| puts f.read(10).inspect }
      end
    end
```

As you can see, the GC happened in between `puts f.read(10).inspect` finished executing and the implicit `f.close` internal to `File.open`, and we don't call `IO.for_fd` here.

----------------------------------------
Bug #11744: Open files being GC'ed while still in use
https://bugs.ruby-lang.org/issues/11744#change-55109

* Author: Godfrey Chan
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: 2.2.3
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
Hello!

We recently added a feature on Rails that uses the [`listen` gem](https://github.com/guard/listen) under-the-hood. Since we enabled those tests, we have been seeing random failures on Travis CI like [these](https://travis-ci.org/rails/rails/jobs/92563035#L976):

```
  1) Error:
KeyGeneratorTest#test_Generating_a_key_of_an_alternative_length:
Errno::EBADF: Bad file descriptor @ fptr_finalize - /home/travis/build/rails/rails/activesupport/lib/active_support/key_generator.rb
    /home/travis/build/rails/rails/activesupport/test/key_generator_test.rb:16:in `setup'
```

What's happening here is that the [setup code](https://github.com/rails/rails/blob/master/activesupport/test/key_generator_test.rb#L16) for this test is referencing an [constant previously marked for autoload](https://github.com/rails/rails/blob/master/activesupport/lib/active_support.rb#L52), and something went wrong *during* autoload (inside Ruby).

(By the way, I was referring to [Ruby's `autoload` feature](http://ruby-doc.org/core-2.2.3/Module.html#method-i-autoload) which is not specific to Rails, although we did [enhance it to infer the filename](https://github.com/rails/rails/blob/e1e6499ede1dd196c03f650b95c3a0098c7c32ff/activesupport/lib/active_support/dependencies/autoload.rb).)

Because there are so many things going on in the test suite, I tried to narrow down the problem further by removing most of the work we do.

Eventually, I arrived at this [reproduction script](https://github.com/rails/rails/blob/9982d60b34b976a826358644d1157d2589b92e94/activesupport/test/file_evented_update_checker_test.rb#L4-L39).

I disabled the rest of the test suite so that this file is the only test being ran, in three different configurations:

```
(1) LISTEN=1 LISTEN_GC_FIX=1 bundle exec rake test
(2) LISTEN=1 bundle exec rake test
(3) LISTEN=0 bundle exec rake test
```

In each of the tests, we tried to do a very simple `File.open(..., 'r') { |f| ... }` (reading in the first 10 bytes to make sure we are dealing with the correct file) up to 5000 times.

In (1), we set up 20 listeners, do no work, and properly tear down each of them, and disabling the GC before opening the files.

In (2), we also setup the listeners but do not disable the GC.

In (3), we do not set up the listeners nor disable the GC.

Scenario (2) reliably fails on Travis CI, and the rest reliably passes: https://travis-ci.org/rails/rails/builds/93402196

```
/home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `close': Bad file descriptor @ fptr_finalize - vendor/bundle/ruby/2.2.0/cache/bundler/git/mail-c03100656df0e86c13a2d6103ac408da7a4e728c/objects/34/fdd44ee4de34a2f05b0088562e38797db604a8 (Errno::EBADF)
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `open'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `block (2 levels) in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `glob'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `block in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `chdir'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `<top (required)>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `require'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `each'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `block in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `select'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `<main>'
```

Note that we are using the "block form" `File.open` in the test here, and [the error](https://travis-ci.org/rails/rails/jobs/93402198#L4418) is happening during the implicit `close` call inside Ruby. Also note that we are *loading* the listen gem in all three cases, so merely having it loaded does not trigger the issue.

I was not able to reproduce this locally on a Mac, so this is probably Linux specific. From what I can tell, listen gem uses [rb-inotify](https://github.com/nex3/rb-inotify) (which in turns uses ffi) on Linux so that might be related too.

* * *

My hypothesis is that, so interaction between listen/rb-inotify/ffi during the listener setup/teardown phase is putting the Ruby VM/process into a bad state. In this state, somehow the VM is confused about what is being used. When the GC runs, it somehow thinks that the file we are currently reading (the `f` inside the `File.open(...) { |f| ... }` call) is no longer being used, causing it to be GC-ed and auto-closed. When the block finishes running (it did indeed open the file correctly, as it was able to read the first 10 bytes correctly), the [finalizer](https://github.com/ruby/ruby/blob/v2_2_3/io.c#L6387) code tries to close FD again, causing the OS to return a `EBADF` error.

While it is certainly possible that the listen/rb-inotify/ffi gems are doing something wrong, it seems quite strange that they could put the Ruby process in such a state, so I figured it is worth having Ruby core people look into this.

I tried to [reproduce this outside of Rails](https://github.com/chancancode/listen-bug) but so far I cannot make it work, I'll keep trying. (Feel free to fork it.)



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:71707] [Ruby trunk - Bug #11744] Open files being GC'ed while still in use
       [not found] <redmine.issue-11744.20151126181251@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2015-11-27  9:22 ` [ruby-core:71703] " godfreykfc
@ 2015-11-27 16:35 ` godfreykfc
  2015-11-27 18:16 ` [ruby-core:71708] [Ruby trunk - Bug #11744] [Closed] " naruse
  5 siblings, 0 replies; 6+ messages in thread
From: godfreykfc @ 2015-11-27 16:35 UTC (permalink / raw)
  To: ruby-core

Issue #11744 has been updated by Godfrey Chan.


Actually, @nurse you are right!

It turns out that rb-inotify uses `IO.fd_for` in its code. At some point, the system starts recycling FD numbers, so we got the same FD number on our file as the rb-inotify object waiting to be GC'ed. When the GC hit at that moment, the finalizer would have auto-closed the "wrong" file.

So this can be closed. Thank you very much for helping us! We will send a patch to rb-inotify!

----------------------------------------
Bug #11744: Open files being GC'ed while still in use
https://bugs.ruby-lang.org/issues/11744#change-55112

* Author: Godfrey Chan
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: 2.2.3
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
Hello!

We recently added a feature on Rails that uses the [`listen` gem](https://github.com/guard/listen) under-the-hood. Since we enabled those tests, we have been seeing random failures on Travis CI like [these](https://travis-ci.org/rails/rails/jobs/92563035#L976):

```
  1) Error:
KeyGeneratorTest#test_Generating_a_key_of_an_alternative_length:
Errno::EBADF: Bad file descriptor @ fptr_finalize - /home/travis/build/rails/rails/activesupport/lib/active_support/key_generator.rb
    /home/travis/build/rails/rails/activesupport/test/key_generator_test.rb:16:in `setup'
```

What's happening here is that the [setup code](https://github.com/rails/rails/blob/master/activesupport/test/key_generator_test.rb#L16) for this test is referencing an [constant previously marked for autoload](https://github.com/rails/rails/blob/master/activesupport/lib/active_support.rb#L52), and something went wrong *during* autoload (inside Ruby).

(By the way, I was referring to [Ruby's `autoload` feature](http://ruby-doc.org/core-2.2.3/Module.html#method-i-autoload) which is not specific to Rails, although we did [enhance it to infer the filename](https://github.com/rails/rails/blob/e1e6499ede1dd196c03f650b95c3a0098c7c32ff/activesupport/lib/active_support/dependencies/autoload.rb).)

Because there are so many things going on in the test suite, I tried to narrow down the problem further by removing most of the work we do.

Eventually, I arrived at this [reproduction script](https://github.com/rails/rails/blob/9982d60b34b976a826358644d1157d2589b92e94/activesupport/test/file_evented_update_checker_test.rb#L4-L39).

I disabled the rest of the test suite so that this file is the only test being ran, in three different configurations:

```
(1) LISTEN=1 LISTEN_GC_FIX=1 bundle exec rake test
(2) LISTEN=1 bundle exec rake test
(3) LISTEN=0 bundle exec rake test
```

In each of the tests, we tried to do a very simple `File.open(..., 'r') { |f| ... }` (reading in the first 10 bytes to make sure we are dealing with the correct file) up to 5000 times.

In (1), we set up 20 listeners, do no work, and properly tear down each of them, and disabling the GC before opening the files.

In (2), we also setup the listeners but do not disable the GC.

In (3), we do not set up the listeners nor disable the GC.

Scenario (2) reliably fails on Travis CI, and the rest reliably passes: https://travis-ci.org/rails/rails/builds/93402196

```
/home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `close': Bad file descriptor @ fptr_finalize - vendor/bundle/ruby/2.2.0/cache/bundler/git/mail-c03100656df0e86c13a2d6103ac408da7a4e728c/objects/34/fdd44ee4de34a2f05b0088562e38797db604a8 (Errno::EBADF)
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `open'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `block (2 levels) in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `glob'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `block in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `chdir'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `<top (required)>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `require'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `each'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `block in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `select'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `<main>'
```

Note that we are using the "block form" `File.open` in the test here, and [the error](https://travis-ci.org/rails/rails/jobs/93402198#L4418) is happening during the implicit `close` call inside Ruby. Also note that we are *loading* the listen gem in all three cases, so merely having it loaded does not trigger the issue.

I was not able to reproduce this locally on a Mac, so this is probably Linux specific. From what I can tell, listen gem uses [rb-inotify](https://github.com/nex3/rb-inotify) (which in turns uses ffi) on Linux so that might be related too.

* * *

My hypothesis is that, so interaction between listen/rb-inotify/ffi during the listener setup/teardown phase is putting the Ruby VM/process into a bad state. In this state, somehow the VM is confused about what is being used. When the GC runs, it somehow thinks that the file we are currently reading (the `f` inside the `File.open(...) { |f| ... }` call) is no longer being used, causing it to be GC-ed and auto-closed. When the block finishes running (it did indeed open the file correctly, as it was able to read the first 10 bytes correctly), the [finalizer](https://github.com/ruby/ruby/blob/v2_2_3/io.c#L6387) code tries to close FD again, causing the OS to return a `EBADF` error.

While it is certainly possible that the listen/rb-inotify/ffi gems are doing something wrong, it seems quite strange that they could put the Ruby process in such a state, so I figured it is worth having Ruby core people look into this.

I tried to [reproduce this outside of Rails](https://github.com/chancancode/listen-bug) but so far I cannot make it work, I'll keep trying. (Feel free to fork it.)



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:71708] [Ruby trunk - Bug #11744] [Closed] Open files being GC'ed while still in use
       [not found] <redmine.issue-11744.20151126181251@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2015-11-27 16:35 ` [ruby-core:71707] " godfreykfc
@ 2015-11-27 18:16 ` naruse
  5 siblings, 0 replies; 6+ messages in thread
From: naruse @ 2015-11-27 18:16 UTC (permalink / raw)
  To: ruby-core

Issue #11744 has been updated by Yui NARUSE.

Status changed from Open to Closed

Great!
I'm glad to hear that.

----------------------------------------
Bug #11744: Open files being GC'ed while still in use
https://bugs.ruby-lang.org/issues/11744#change-55113

* Author: Godfrey Chan
* Status: Closed
* Priority: Normal
* Assignee: 
* ruby -v: 2.2.3
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
Hello!

We recently added a feature on Rails that uses the [`listen` gem](https://github.com/guard/listen) under-the-hood. Since we enabled those tests, we have been seeing random failures on Travis CI like [these](https://travis-ci.org/rails/rails/jobs/92563035#L976):

```
  1) Error:
KeyGeneratorTest#test_Generating_a_key_of_an_alternative_length:
Errno::EBADF: Bad file descriptor @ fptr_finalize - /home/travis/build/rails/rails/activesupport/lib/active_support/key_generator.rb
    /home/travis/build/rails/rails/activesupport/test/key_generator_test.rb:16:in `setup'
```

What's happening here is that the [setup code](https://github.com/rails/rails/blob/master/activesupport/test/key_generator_test.rb#L16) for this test is referencing an [constant previously marked for autoload](https://github.com/rails/rails/blob/master/activesupport/lib/active_support.rb#L52), and something went wrong *during* autoload (inside Ruby).

(By the way, I was referring to [Ruby's `autoload` feature](http://ruby-doc.org/core-2.2.3/Module.html#method-i-autoload) which is not specific to Rails, although we did [enhance it to infer the filename](https://github.com/rails/rails/blob/e1e6499ede1dd196c03f650b95c3a0098c7c32ff/activesupport/lib/active_support/dependencies/autoload.rb).)

Because there are so many things going on in the test suite, I tried to narrow down the problem further by removing most of the work we do.

Eventually, I arrived at this [reproduction script](https://github.com/rails/rails/blob/9982d60b34b976a826358644d1157d2589b92e94/activesupport/test/file_evented_update_checker_test.rb#L4-L39).

I disabled the rest of the test suite so that this file is the only test being ran, in three different configurations:

```
(1) LISTEN=1 LISTEN_GC_FIX=1 bundle exec rake test
(2) LISTEN=1 bundle exec rake test
(3) LISTEN=0 bundle exec rake test
```

In each of the tests, we tried to do a very simple `File.open(..., 'r') { |f| ... }` (reading in the first 10 bytes to make sure we are dealing with the correct file) up to 5000 times.

In (1), we set up 20 listeners, do no work, and properly tear down each of them, and disabling the GC before opening the files.

In (2), we also setup the listeners but do not disable the GC.

In (3), we do not set up the listeners nor disable the GC.

Scenario (2) reliably fails on Travis CI, and the rest reliably passes: https://travis-ci.org/rails/rails/builds/93402196

```
/home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `close': Bad file descriptor @ fptr_finalize - vendor/bundle/ruby/2.2.0/cache/bundler/git/mail-c03100656df0e86c13a2d6103ac408da7a4e728c/objects/34/fdd44ee4de34a2f05b0088562e38797db604a8 (Errno::EBADF)
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `open'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:31:in `block (2 levels) in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `glob'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:26:in `block in <top (required)>'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `chdir'
	from /home/travis/build/rails/rails/activesupport/test/file_evented_update_checker_test.rb:23:in `<top (required)>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `require'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `each'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:9:in `block in <main>'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `select'
	from /home/travis/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/rake/rake_test_loader.rb:4:in `<main>'
```

Note that we are using the "block form" `File.open` in the test here, and [the error](https://travis-ci.org/rails/rails/jobs/93402198#L4418) is happening during the implicit `close` call inside Ruby. Also note that we are *loading* the listen gem in all three cases, so merely having it loaded does not trigger the issue.

I was not able to reproduce this locally on a Mac, so this is probably Linux specific. From what I can tell, listen gem uses [rb-inotify](https://github.com/nex3/rb-inotify) (which in turns uses ffi) on Linux so that might be related too.

* * *

My hypothesis is that, so interaction between listen/rb-inotify/ffi during the listener setup/teardown phase is putting the Ruby VM/process into a bad state. In this state, somehow the VM is confused about what is being used. When the GC runs, it somehow thinks that the file we are currently reading (the `f` inside the `File.open(...) { |f| ... }` call) is no longer being used, causing it to be GC-ed and auto-closed. When the block finishes running (it did indeed open the file correctly, as it was able to read the first 10 bytes correctly), the [finalizer](https://github.com/ruby/ruby/blob/v2_2_3/io.c#L6387) code tries to close FD again, causing the OS to return a `EBADF` error.

While it is certainly possible that the listen/rb-inotify/ffi gems are doing something wrong, it seems quite strange that they could put the Ruby process in such a state, so I figured it is worth having Ruby core people look into this.

I tried to [reproduce this outside of Rails](https://github.com/chancancode/listen-bug) but so far I cannot make it work, I'll keep trying. (Feel free to fork it.)



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-11-27 17:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-11744.20151126181251@ruby-lang.org>
2015-11-26 18:12 ` [ruby-core:71697] [Ruby trunk - Bug #11744] [Open] Open files being GC'ed while still in use godfreykfc
2015-11-26 18:25 ` [ruby-core:71698] [Ruby trunk - Bug #11744] " godfreykfc
2015-11-27  5:46 ` [ruby-core:71701] " naruse
2015-11-27  9:22 ` [ruby-core:71703] " godfreykfc
2015-11-27 16:35 ` [ruby-core:71707] " godfreykfc
2015-11-27 18:16 ` [ruby-core:71708] [Ruby trunk - Bug #11744] [Closed] " naruse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).