ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:117658] [Ruby master Feature#20448] Make coverage event hooking C API public
@ 2024-04-23 17:55 ms-tob (Matt S) via ruby-core
  2024-04-23 20:02 ` [ruby-core:117661] " mame (Yusuke Endoh) via ruby-core
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: ms-tob (Matt S) via ruby-core @ 2024-04-23 17:55 UTC (permalink / raw)
  To: ruby-core; +Cc: ms-tob (Matt S)

Issue #20448 has been reported by ms-tob (Matt S).

----------------------------------------
Feature #20448: Make coverage event hooking C API public
https://bugs.ruby-lang.org/issues/20448

* Author: ms-tob (Matt S)
* Status: Open
----------------------------------------
# Abstract

Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API.

# Background

Ruby currently provides a number of avenues for hooking events *or* gathering coverage information:

1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module
2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module
3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hooks+for+the+interpreter+events) extension function

Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events.

# Proposal

The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete.

The good news is that much of this functionality already exists, but it's part of the private, internal-only C API.

1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184
  a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-L46
2. Allow initializing global coverage state so that coverage tracking can be fully enabled
  a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120.
  b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state.

I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`:

```c
#include <ruby.h>
#include <ruby/debug.h>

#define RUBY_EVENT_COVERAGE_BRANCH 0x020000

// ...

rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH;
rb_event_hook_flag_t flags = (
    RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG
);
rb_add_event_hook2(
    (rb_event_hook_func_t) event_hook_branch,
    events,
    counter_hash,
    flags
);
```

If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features.

The above two points would be requirements for this functionality, but there's an additional nice-to-have:

3. Extend the public `tracearg` functionality to include additional coverage information
  a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event.

# Use cases

This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided-ruby-fuzzer/. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L231.

So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution.

We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9

# Discussion

Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal.

The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case.

# See also

- https://bugs.ruby-lang.org/issues/20282
- https://github.com/google/atheris
  - https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html
- https://github.com/CodeIntelligenceTesting/jazzer/
  - https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer
- https://go.dev/doc/security/fuzz/



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:117661] [Ruby master Feature#20448] Make coverage event hooking C API public
  2024-04-23 17:55 [ruby-core:117658] [Ruby master Feature#20448] Make coverage event hooking C API public ms-tob (Matt S) via ruby-core
@ 2024-04-23 20:02 ` mame (Yusuke Endoh) via ruby-core
  2024-04-24 12:07 ` [ruby-core:117683] " ms-tob (Matt S) via ruby-core
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2024-04-23 20:02 UTC (permalink / raw)
  To: ruby-core; +Cc: mame (Yusuke Endoh)

Issue #20448 has been updated by mame (Yusuke Endoh).


First of all, it's great that you are creating a fuzzing tool for Ruby. Thank you for your work.

I understand that what you need is a hook for LINE and BRANCH events. Am I correct?

If so, did you try RUBY_EVENT_LINE? It is already a public event and should be almost equivalent to RUBY_EVENT_COVERAGE_LINE. One difference is that RUBY_EVENT_COVERAGE_LINE hook is only called for bytecodes built after `Coverage.start` is executed (i.e., after `rb_get_coverages` returns non-null), whereas RUBY_EVENT_LINE hook is called for all bytecodes including one built before `Coverage.start`. I don't know if this difference will be a problem for your use case, but if necessary, you could filter by file path.

RUBY_EVENT_COVERAGE_BRANCH is more complicated because it strongly depends on compiler implementation details. This feature requires an instrumented bytecode to be output at compile time, so it must be enabled before compilation of target Ruby code. Also, currently, column information is not left in the byte code. The current branch coverage measurement is achieved by separately bookkeeping the column information at compile time. I don't know if these can be cut out as a clean API.

----------------------------------------
Feature #20448: Make coverage event hooking C API public
https://bugs.ruby-lang.org/issues/20448#change-108069

* Author: ms-tob (Matt S)
* Status: Open
----------------------------------------
# Abstract

Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API.

# Background

Ruby currently provides a number of avenues for hooking events *or* gathering coverage information:

1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module
2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module
3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hooks+for+the+interpreter+events) extension function

Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events.

# Proposal

The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete.

The good news is that much of this functionality already exists, but it's part of the private, internal-only C API.

1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184
  a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-L46
2. Allow initializing global coverage state so that coverage tracking can be fully enabled
  a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120.
  b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state.

I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`:

```c
#include <ruby.h>
#include <ruby/debug.h>

#define RUBY_EVENT_COVERAGE_BRANCH 0x020000

// ...

rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH;
rb_event_hook_flag_t flags = (
    RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG
);
rb_add_event_hook2(
    (rb_event_hook_func_t) event_hook_branch,
    events,
    counter_hash,
    flags
);
```

If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features.

The above two points would be requirements for this functionality, but there's an additional nice-to-have:

3. Extend the public `tracearg` functionality to include additional coverage information
  a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event.

# Use cases

This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided-ruby-fuzzer/. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L231.

So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution.

We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9

# Discussion

Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal.

The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case.

# See also

- https://bugs.ruby-lang.org/issues/20282
- https://github.com/google/atheris
  - https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html
- https://github.com/CodeIntelligenceTesting/jazzer/
  - https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer
- https://go.dev/doc/security/fuzz/



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:117683] [Ruby master Feature#20448] Make coverage event hooking C API public
  2024-04-23 17:55 [ruby-core:117658] [Ruby master Feature#20448] Make coverage event hooking C API public ms-tob (Matt S) via ruby-core
  2024-04-23 20:02 ` [ruby-core:117661] " mame (Yusuke Endoh) via ruby-core
@ 2024-04-24 12:07 ` ms-tob (Matt S) via ruby-core
  2024-04-24 15:48 ` [ruby-core:117688] " mame (Yusuke Endoh) via ruby-core
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: ms-tob (Matt S) via ruby-core @ 2024-04-24 12:07 UTC (permalink / raw)
  To: ruby-core; +Cc: ms-tob (Matt S)

Issue #20448 has been updated by ms-tob (Matt S).


> I understand that what you need is a hook for LINE and BRANCH events. Am I correct?

I just need a hook for BRANCH events. You can find that C extension code here:

https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L231

And I manually define `RUBY_EVENT_COVERAGE_BRANCH` above with `#define RUBY_EVENT_COVERAGE_BRANCH 0x020000`.

> RUBY_EVENT_COVERAGE_BRANCH is more complicated because it strongly depends on compiler implementation details. This feature requires an instrumented bytecode to be output at compile time, so it must be enabled before compilation of target Ruby code. Also, currently, column information is not left in the byte code. The current branch coverage measurement is achieved by separately bookkeeping the column information at compile time. I don't know if these can be cut out as a clean API.

I was able to figure out a workaround for this. By reviewing the [`Coverage`](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html#module-Coverage-label-Usage) module documentation I learned that you have to start coverage gathering, and then `require` a separate Ruby module. My C extension takes the following actions to perform this:

- Enable coverage gathering: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L195-L211
- Then, `require` the caller-provided module name: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L233

Due to this behavior, when fuzzing pure Ruby code (not C extensions), Ruzzy requires two separate scripts. A "tracer" script and a fuzzing harness: https://github.com/trailofbits/ruzzy#fuzzing-pure-ruby-code.

Having this behavior built into the language is extremely helpful. Without it, as you mentioned, we would have to manually instrument the bytecode at runtime to achieve coverage-guided fuzzing. This is what Atheris has to do to fuzz Python code, and it requires significantly more investment into the solution. So, I really appreciate that Ruby has this built in, even if its private, internal-only functionality.

----------------------------------------
Feature #20448: Make coverage event hooking C API public
https://bugs.ruby-lang.org/issues/20448#change-108093

* Author: ms-tob (Matt S)
* Status: Open
----------------------------------------
# Abstract

Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API.

# Background

Ruby currently provides a number of avenues for hooking events *or* gathering coverage information:

1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module
2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module
3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hooks+for+the+interpreter+events) extension function

Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events.

# Proposal

The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete.

The good news is that much of this functionality already exists, but it's part of the private, internal-only C API.

1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184
  a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-L46
2. Allow initializing global coverage state so that coverage tracking can be fully enabled
  a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120.
  b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state.

I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`:

```c
#include <ruby.h>
#include <ruby/debug.h>

#define RUBY_EVENT_COVERAGE_BRANCH 0x020000

// ...

rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH;
rb_event_hook_flag_t flags = (
    RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG
);
rb_add_event_hook2(
    (rb_event_hook_func_t) event_hook_branch,
    events,
    counter_hash,
    flags
);
```

If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features.

The above two points would be requirements for this functionality, but there's an additional nice-to-have:

3. Extend the public `tracearg` functionality to include additional coverage information
  a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event.

# Use cases

This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided-ruby-fuzzer/. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L231.

So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution.

We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9

# Discussion

Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal.

The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case.

# See also

- https://bugs.ruby-lang.org/issues/20282
- https://github.com/google/atheris
  - https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html
- https://github.com/CodeIntelligenceTesting/jazzer/
  - https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer
- https://go.dev/doc/security/fuzz/



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:117688] [Ruby master Feature#20448] Make coverage event hooking C API public
  2024-04-23 17:55 [ruby-core:117658] [Ruby master Feature#20448] Make coverage event hooking C API public ms-tob (Matt S) via ruby-core
  2024-04-23 20:02 ` [ruby-core:117661] " mame (Yusuke Endoh) via ruby-core
  2024-04-24 12:07 ` [ruby-core:117683] " ms-tob (Matt S) via ruby-core
@ 2024-04-24 15:48 ` mame (Yusuke Endoh) via ruby-core
  2024-04-24 15:57 ` [ruby-core:117689] " mame (Yusuke Endoh) via ruby-core
  2024-05-01 13:39 ` [ruby-core:117742] " ms-tob (Matt S) via ruby-core
  4 siblings, 0 replies; 6+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2024-04-24 15:48 UTC (permalink / raw)
  To: ruby-core; +Cc: mame (Yusuke Endoh)

Issue #20448 has been updated by mame (Yusuke Endoh).


> I just need a hook for BRANCH events.

It's a hard mode ;-)

> Ruzzy requires two separate scripts

I guess that the tracer script is always the same (except the file name of the test harness). If so, you may want to make it a library like `ruzzy/tracer.rb`. A user only need to write `test_harness.rb`, and can invoke with `ruby -rruzzy/tracer test_harness.rb`. This is just a matter of taste, though.

Now, I wonder what API design would be good. Can we make TracePoint support branch events? If so, I think coverage.so can be reimplemented on top of TracePoint.

How about an API like this one? (I am not sure if this is really implementable until someone implements it.)

```ruby
TracePoint.enable_branch_tracepoints

TracePoint.new(:branch) do |tp|
  p tp.branch_id #=> Integer ID which is unique for each file path

  p tp.branch_from #=> [first_lineno, first_column, last_lineno, last_column]
  p tp.branch_to #=> [first_lineno, first_column, last_lineno, last_column]
end.enable

load "target.rb"

TracePoint.disable_branch_tracepoints
```

@ko1 What do you think?
@ms-tob Would it be sufficient for your use case?

----------------------------------------
Feature #20448: Make coverage event hooking C API public
https://bugs.ruby-lang.org/issues/20448#change-108097

* Author: ms-tob (Matt S)
* Status: Open
----------------------------------------
# Abstract

Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API.

# Background

Ruby currently provides a number of avenues for hooking events *or* gathering coverage information:

1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module
2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module
3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hooks+for+the+interpreter+events) extension function

Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events.

# Proposal

The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete.

The good news is that much of this functionality already exists, but it's part of the private, internal-only C API.

1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184
  a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-L46
2. Allow initializing global coverage state so that coverage tracking can be fully enabled
  a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120.
  b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state.

I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`:

```c
#include <ruby.h>
#include <ruby/debug.h>

#define RUBY_EVENT_COVERAGE_BRANCH 0x020000

// ...

rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH;
rb_event_hook_flag_t flags = (
    RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG
);
rb_add_event_hook2(
    (rb_event_hook_func_t) event_hook_branch,
    events,
    counter_hash,
    flags
);
```

If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features.

The above two points would be requirements for this functionality, but there's an additional nice-to-have:

3. Extend the public `tracearg` functionality to include additional coverage information
  a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event.

# Use cases

This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided-ruby-fuzzer/. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L231.

So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution.

We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9

# Discussion

Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal.

The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case.

# See also

- https://bugs.ruby-lang.org/issues/20282
- https://github.com/google/atheris
  - https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html
- https://github.com/CodeIntelligenceTesting/jazzer/
  - https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer
- https://go.dev/doc/security/fuzz/



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:117689] [Ruby master Feature#20448] Make coverage event hooking C API public
  2024-04-23 17:55 [ruby-core:117658] [Ruby master Feature#20448] Make coverage event hooking C API public ms-tob (Matt S) via ruby-core
                   ` (2 preceding siblings ...)
  2024-04-24 15:48 ` [ruby-core:117688] " mame (Yusuke Endoh) via ruby-core
@ 2024-04-24 15:57 ` mame (Yusuke Endoh) via ruby-core
  2024-05-01 13:39 ` [ruby-core:117742] " ms-tob (Matt S) via ruby-core
  4 siblings, 0 replies; 6+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2024-04-24 15:57 UTC (permalink / raw)
  To: ruby-core; +Cc: mame (Yusuke Endoh)

Issue #20448 has been updated by mame (Yusuke Endoh).


Oh, this API does not allow to get information of branch_from and branch_to for branches that are never fired. It is insufficient to reimplement coverage.so on TracePoint. We have to actually create a patch and identify the appropriate API.

----------------------------------------
Feature #20448: Make coverage event hooking C API public
https://bugs.ruby-lang.org/issues/20448#change-108098

* Author: ms-tob (Matt S)
* Status: Open
----------------------------------------
# Abstract

Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API.

# Background

Ruby currently provides a number of avenues for hooking events *or* gathering coverage information:

1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module
2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module
3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hooks+for+the+interpreter+events) extension function

Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events.

# Proposal

The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete.

The good news is that much of this functionality already exists, but it's part of the private, internal-only C API.

1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184
  a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-L46
2. Allow initializing global coverage state so that coverage tracking can be fully enabled
  a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120.
  b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state.

I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`:

```c
#include <ruby.h>
#include <ruby/debug.h>

#define RUBY_EVENT_COVERAGE_BRANCH 0x020000

// ...

rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH;
rb_event_hook_flag_t flags = (
    RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG
);
rb_add_event_hook2(
    (rb_event_hook_func_t) event_hook_branch,
    events,
    counter_hash,
    flags
);
```

If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features.

The above two points would be requirements for this functionality, but there's an additional nice-to-have:

3. Extend the public `tracearg` functionality to include additional coverage information
  a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event.

# Use cases

This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided-ruby-fuzzer/. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L231.

So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution.

We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9

# Discussion

Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal.

The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case.

# See also

- https://bugs.ruby-lang.org/issues/20282
- https://github.com/google/atheris
  - https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html
- https://github.com/CodeIntelligenceTesting/jazzer/
  - https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer
- https://go.dev/doc/security/fuzz/



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:117742] [Ruby master Feature#20448] Make coverage event hooking C API public
  2024-04-23 17:55 [ruby-core:117658] [Ruby master Feature#20448] Make coverage event hooking C API public ms-tob (Matt S) via ruby-core
                   ` (3 preceding siblings ...)
  2024-04-24 15:57 ` [ruby-core:117689] " mame (Yusuke Endoh) via ruby-core
@ 2024-05-01 13:39 ` ms-tob (Matt S) via ruby-core
  4 siblings, 0 replies; 6+ messages in thread
From: ms-tob (Matt S) via ruby-core @ 2024-05-01 13:39 UTC (permalink / raw)
  To: ruby-core; +Cc: ms-tob (Matt S)

Issue #20448 has been updated by ms-tob (Matt S).


Sorry for the delay, I've been considering different APIs and reevaluating Ruzzy's coverage collection. I think I've identified that Ruzzy's current coverage collection mechanism is flawed, so perhaps we should not use that as the goal. I'll explain the goal at a high-level, show where I think Ruzzy's implementation is flawed, and I'd be curious to hear what you think.

For fuzzing coverage collection the ultimate goal will be to instrument [basic blocks](https://en.wikipedia.org/wiki/Basic_block). To instrument basic blocks we need a unique identifier for each one. These identifiers do not have to be consistent across fuzzing runs, just unique within a single fuzzing run (i.e. an invocation of the fuzzing program). For Ruby, I think this would look something like the following:

```ruby
def main
  # Basic block: BB1, ignore root block for now, it's a special case
  # ...
  if username == "John"
    # Basic block BB2
    # ...
  elsif username == "Jane"
    # Basic block BB3
    # ...
  else
    # Basic block BB4
    # ...
  end
  # ...
  unless username == "David"
    # Basic block BB5
    # ...
  end
  # ...
  case username
  when "Danielle"
    # Basic block BB6
    # ...
  when "Mark"
    # Basic block BB7
    # ...
  else
    # Basic block BB8
    # ...
  end
  # ...
end
```

Forgive me if I'm missing some branching constructs, but the goal here is to maximize branch coverage. So we'd like to be able to notify the fuzzer that it has generated an input that increased coverage. In other words, it has found a new branch. This is accomplished by identifying new basic blocks after a branch event. Atheris achieves this by rewriting the Python bytecode and [inserting a function call with a unique identifier in the branch's target basic block](https://github.com/google/atheris/blob/2.3.0/src/instrument_bytecode.py#L758-L825). It does this for all [conditional jumps](https://github.com/google/atheris/blob/2.3.0/src/version_dependent.py#L49-L67), i.e. branches.

Currently, Ruzzy is only instrumenting `(filepath, lineno)` tuples during branch events (`RUBY_EVENT_COVERAGE_BRANCH`). I think this is effectively producing instrumentation like the following:

```ruby
def main
  # ...
  if username == "John" # Basic block BB1
    # ...
  elsif username == "Jane"
    # ...
  else
    # ...
  end
  # ...
  unless username == "David" # Basic block BB2
    # ...
  end
  # ...
  case username # Basic block BB3 (are case statements included in branch events?)
  when "Danielle"
    # ...
  when "Mark"
    # ...
  else
    # ...
  end
  # ...
end
```

I'll have to do some testing to confirm if this is how `RUBY_EVENT_COVERAGE_BRANCH` works or not. Correct me if I'm wrong, but I do not think hooking `RUBY_EVENT_COVERAGE_BRANCH` will work because it does not provide information on which branch is being taken.

The [branches information](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html#module-Coverage-label-Branches+Coverage) in the `Coverage` module is very close to what we need. It provides a unique identifier for each basic block. The only deficiency is that we need this information in realtime as a branch event happens.

The `debug.h` module publicly provides us with a [`rb_trace_arg_t`](https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/debug.h#L465) during branch events, but we don't have access to any of the [`rb_trace_arg_struct`](https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2089-L2104) internals. The `Coverage.result` functionality is ultimately provided with this internal coverage information by a call to [`rb_get_coverages`](https://github.com/ruby/ruby/blob/v3_3_0/thread.c#L5685-L5689). It then generates unique identifiers for basic blocks after coverage information has already been gathered.

I wonder, is there a way to generalize this functionality such that the `Coverage` module can still use it, and a hooking API is provided to publicly expose this coverage gathering in realtime? I think something like the following would work:

```ruby
TracePoint.enable_branch_tracepoints

TracePoint.new(:branch) do |tp|
  p tp.branch_id #=> Integer ID which is unique for each file path
  p tp.branch_target #=> [id, first_lineno, first_column, last_lineno, last_column], Another unique ID for branch target
end.enable

load "target.rb"

TracePoint.disable_branch_tracepoints
```

This is relatively close to the existing [`Coverage` branches result](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html#module-Coverage-label-Branches+Coverage). What do you think?

----------------------------------------
Feature #20448: Make coverage event hooking C API public
https://bugs.ruby-lang.org/issues/20448#change-108153

* Author: ms-tob (Matt S)
* Status: Open
----------------------------------------
# Abstract

Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API.

# Background

Ruby currently provides a number of avenues for hooking events *or* gathering coverage information:

1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module
2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module
3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hooks+for+the+interpreter+events) extension function

Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events.

# Proposal

The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete.

The good news is that much of this functionality already exists, but it's part of the private, internal-only C API.

1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184
  a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-L46
2. Allow initializing global coverage state so that coverage tracking can be fully enabled
  a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120.
  b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state.

I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`:

```c
#include <ruby.h>
#include <ruby/debug.h>

#define RUBY_EVENT_COVERAGE_BRANCH 0x020000

// ...

rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH;
rb_event_hook_flag_t flags = (
    RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG
);
rb_add_event_hook2(
    (rb_event_hook_func_t) event_hook_branch,
    events,
    counter_hash,
    flags
);
```

If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features.

The above two points would be requirements for this functionality, but there's an additional nice-to-have:

3. Extend the public `tracearg` functionality to include additional coverage information
  a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event.

# Use cases

This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided-ruby-fuzzer/. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L231.

So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution.

We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9

# Discussion

Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal.

The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case.

# See also

- https://bugs.ruby-lang.org/issues/20282
- https://github.com/google/atheris
  - https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html
- https://github.com/CodeIntelligenceTesting/jazzer/
  - https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer
- https://go.dev/doc/security/fuzz/



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-01 13:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-23 17:55 [ruby-core:117658] [Ruby master Feature#20448] Make coverage event hooking C API public ms-tob (Matt S) via ruby-core
2024-04-23 20:02 ` [ruby-core:117661] " mame (Yusuke Endoh) via ruby-core
2024-04-24 12:07 ` [ruby-core:117683] " ms-tob (Matt S) via ruby-core
2024-04-24 15:48 ` [ruby-core:117688] " mame (Yusuke Endoh) via ruby-core
2024-04-24 15:57 ` [ruby-core:117689] " mame (Yusuke Endoh) via ruby-core
2024-05-01 13:39 ` [ruby-core:117742] " ms-tob (Matt S) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).