ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: jean.boussier@gmail.com
To: ruby-core@ruby-lang.org
Subject: [ruby-core:103421] [Ruby master Feature#17795] `before_fork` and `after_fork` callback API
Date: Tue, 13 Apr 2021 10:52:10 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-91512.20210413105209.7941@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-17795.20210412080908.7941@ruby-lang.org

Issue #17795 has been updated by byroot (Jean Boussier).


@jeremyevans0 I understand your point, and I agree that in an ideal world this wouldn't be necessary.

However pragmatically speaking, `getpid(2)` checks are about as old as `fork(2)`. Even the [`glibc` changelog](https://sourceware.org/glibc/wiki/Release/2.25#pid_cache_removal) acknowledge it:

> Applications performing getpid() calls in a loop will see the worst case performance degradation as the library call will perform a system call at each invocation.

And suggest to use `pthread_atfork()` instead:

> Applications performing getpid() in a loop that need to do some level of fork()-based invalidation can instead use pthread_atfork() to register handlers to handle the invalidation. 

I think it's a clear indication that this need is nothing new.

> The main issue I see with this is the potential for misuse.

I agree, but this argument could apply to a large part of Ruby, I'm not sure it's relevant.

----------------------------------------
Feature #17795: `before_fork` and `after_fork` callback API
https://bugs.ruby-lang.org/issues/17795#change-91512

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
Replaces: https://bugs.ruby-lang.org/issues/5446

### Context

Ruby code in production is very often running in a forking setup (puma, unicorn, etc), and it is common some types of libraries to need to know when the Ruby process was forked. For instance:

  - Most database clients, ORMs or other libraries keeping a connection pool might need to close connections before the fork happens.
  - Libraries relying on some kind of dispatcher thread might need to restart the thread in the forked children, and clear any internal buffer (e.g. statsd clients, newrelic_rpm).

**This need is only for forking the whole ruby process, extensions doing a `fork(2) + exec(2)` combo etc are not a concern, this aim at only catching `kernel.fork`, `Process.fork` and maybe `Process.daemon`.**.
The use case is for forks that end up executing Ruby code.

### Current solutions

Right now this use case is handled in several ways.

#### Rely on the integrating code to call a `before_fork` or `after_fork` callback.

Some libraries simply rely on documentation and require the user to use the hooks provided by their forking server.

Examples:

  - Sequel: http://sequel.jeremyevans.net/rdoc/files/doc/fork_safety_rdoc.html
  - Rails's Active Record: https://devcenter.heroku.com/articles/concurrency-and-database-connections#multi-process-servers
  - ScoutAPM (it tries to detect popular forking setup and register itself): https://github.com/scoutapp/scout_apm_ruby/blob/fa83793b9e8d2f9a32c920f59b57d7f198f466b8/lib/scout_apm/environment.rb#L142-L146
  - NewRelic RPM (similarly tries to register to popular forking setups): https://www.rubydoc.info/github/newrelic/rpm/NewRelic%2FAgent:after_fork


#### Continuously check `Process.pid`

Some libraries chose to instead keep the process PID in a variable, and to regularly compare it to `Process.pid` to detect forked children.
Unfortunately `Process.pid` is relatively slow on Linux, and these checks tend to be in tight loops, so it's not uncommon when using these libraries
to spend `1` or `2%` of runtime in `Process.pid`.

Examples:

  - Rails's Active Record used to check `Process.pid` https://github.com/Shopify/rails/blob/411ccbdab2608c62aabdb320d52cb02d446bb39c/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L946, it still does but a bit less: https://github.com/rails/rails/pull/41850
  - the `Typhoeus` HTTP client: https://github.com/typhoeus/typhoeus/blob/a345545e5e4ac0522b883fe0cf19e5e2e807b4b0/lib/typhoeus/pool.rb#L34-L42
  - Redis client: https://github.com/redis/redis-rb/blob/6542934f01b9c390ee450bd372209a04bc3a239b/lib/redis/client.rb#L384
  - Some older versions of NewRelic RPM: https://github.com/opendna/scoreranking-api/blob/8fba96d23b4d3e6b64f625079c184f3a292bbc12/vendor/gems/ruby/1.9.1/gems/newrelic_rpm-3.7.3.204/lib/new_relic/agent/harvester.rb#L39-L41

#### Continuously check `Thread#alive?`

Similar to checking `Process.pid`, but for the background thread use case. `Thread#alive?` is regularly checked, and if the thread is dead, it is assumed that the process was forked.
It's much less costly than a `Process.pid`, but also a bit less reliable as the thread could have died for other reasons. It also delays re-creating the thread to the next check rather than immediately upon forking.

Examples:

  - `statsd-instrument`: https://github.com/Shopify/statsd-instrument/blob/0445cca46e29aa48e9f1efec7c72352aff7ec931/lib/statsd/instrument/batched_udp_sink.rb#L63

#### Decorate `Kernel.fork` and `Process.fork`

Another solution is to prepend a module in `Process` and `Kernel`, to decorate the fork method and implement your own callback. It works well, but is made difficult by `Kernel.fork`.


Examples:

  - Active Support: https://github.com/rails/rails/blob/9aed3dcdfea6b64c18035f8e2622c474ba499846/activesupport/lib/active_support/fork_tracker.rb
  - `dd-trace-rb`: https://github.com/DataDog/dd-trace-rb/blob/793946146b4709289cfd459f3b68e8227a9f5fa7/lib/ddtrace/profiling/ext/forking.rb
  - To some extent, `nakayoshi_fork` decorates the `fork` method: https://github.com/ko1/nakayoshi_fork/blob/19ef5efc51e0ae51d7f5f37a0b785309bf16e97f/lib/nakayoshi_fork.rb

### Proposals

I see two possible features to improve this situation:

#### `Process.before_fork` and `Process.after_fork` callbacks

One solution would be for Ruby to expose a callback API for these two events, similar to `Kernel.at_exit`.

#### Make `Kernel.fork` a delegator

A simpler change would be to just make `Kernel.fork` a delegator to `Process.fork`. This would make it much easier to prepend a module on `Process` for each library to implement its own callback.

Proposed patch: https://github.com/ruby/ruby/pull/4361



-- 
https://bugs.ruby-lang.org/

  parent reply	other threads:[~2021-04-13 10:52 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-12  8:09 [ruby-core:103400] [Ruby master Feature#17795] `before_fork` and `after_fork` callback API jean.boussier
2021-04-12 13:24 ` [ruby-core:103402] " daniel
2021-04-12 14:12 ` [ruby-core:103403] " jean.boussier
2021-04-12 14:59 ` [ruby-core:103404] " daniel
2021-04-13  1:32 ` [ruby-core:103410] " muraken
2021-04-13  2:32 ` [ruby-core:103411] " merch-redmine
2021-04-13 10:52 ` jean.boussier [this message]
2021-04-13 11:16 ` [ruby-core:103423] " eregontp
2021-04-13 14:36 ` [ruby-core:103427] " mame
2021-04-13 14:46 ` [ruby-core:103428] " daniel
2021-04-13 15:17 ` [ruby-core:103431] " jean.boussier
2021-04-13 16:32 ` [ruby-core:103432] " mame
2021-04-13 22:22 ` [ruby-core:103443] " jean.boussier
2021-04-14  6:07 ` [ruby-core:103447] " ko1
2021-04-14  7:34 ` [ruby-core:103448] [Ruby master Feature#17795] Around `Process.fork` callbacks API jean.boussier
2021-04-16  8:03 ` [ruby-core:103476] " akr
2021-04-16  8:07 ` [ruby-core:103478] " jean.boussier
2021-04-16 13:28 ` [ruby-core:103482] " ivo.anjo
2021-04-17  7:56 ` [ruby-core:103495] " mame
2021-04-17 11:02 ` [ruby-core:103499] " jean.boussier
2021-04-17 17:43 ` [ruby-core:103501] " mame
2021-04-19  9:55 ` [ruby-core:103512] " jean.boussier
2021-04-21  7:30 ` [ruby-core:103537] " sam.saffron
2021-04-22 13:21 ` [ruby-core:103550] " jean.boussier
2021-04-22 14:34 ` [ruby-core:103552] " daniel
2021-04-22 15:23 ` [ruby-core:103553] " daniel
2021-04-22 18:58 ` [ruby-core:103558] " jean.boussier
2021-04-23 16:26 ` [ruby-core:103575] " eregontp
2021-05-03 11:34 ` [ruby-core:103697] " eregontp
2021-05-21 17:45 ` [ruby-core:103952] " mame
2021-05-21 21:23 ` [ruby-core:103959] " merch-redmine
2021-06-02  7:46 ` [ruby-core:104135] " ivo.anjo
2021-06-02  8:40 ` [ruby-core:104136] " jean.boussier
2021-07-15  7:18 ` [ruby-core:104614] " matz
2021-07-15  7:29 ` [ruby-core:104615] " mame
2021-07-15 20:15 ` [ruby-core:104624] " jean.boussier
2021-07-15 20:40 ` [ruby-core:104625] " jean.boussier
2021-07-16  6:07 ` [ruby-core:104628] " mame
2021-10-17 19:39 ` [ruby-core:105657] " Dan0042 (Daniel DeLorme)
2021-10-25  2:36 ` [ruby-core:105768] " matz (Yukihiro Matsumoto)
2021-10-25  4:18 ` [ruby-core:105771] " mame (Yusuke Endoh)
2021-10-25 10:03 ` [ruby-core:105785] " ioquatix (Samuel Williams)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-91512.20210413105209.7941@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).