From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_PASS,SPF_PASS,URIBL_SBL_A autolearn=ham autolearn_force=no version=3.4.6 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [94.130.110.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 492301F44D for ; Wed, 24 Apr 2024 15:57:24 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; secure) header.d=ml.ruby-lang.org header.i=@ml.ruby-lang.org header.a=rsa-sha256 header.s=mail header.b=aHlVX0Ie; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=Skao4FLD; dkim-atps=neutral Received: from nue.mailmanlists.eu (localhost [127.0.0.1]) by nue.mailmanlists.eu (Postfix) with ESMTP id A3AD88445F; Wed, 24 Apr 2024 15:57:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ml.ruby-lang.org; s=mail; t=1713974235; bh=Xu2c3MKUFln9ywZ39Wz1nMHyeNeVKAlS5kBp4SMI4QE=; h=Date:References:To:Reply-To:Subject:List-Id:List-Archive: List-Help:List-Owner:List-Post:List-Subscribe:List-Unsubscribe: From:Cc:From; b=aHlVX0Ie/B2yensLEDNJmauzaGCCgoyqScuFvZnNMfyk9E4wJgohjRrtkyOQGXvV2 6ePVeRGJJO04lS+lngNWeIhoa3n5PiwfGdB9tZTm5jEDftjbUwKmn8zz5W7JiBu5H8 5cNRpbuND4cwm/MGgGshO5t/2wiNkIBfoWEclQ4A= Received: from s.wrqvtzvf.outbound-mail.sendgrid.net (s.wrqvtzvf.outbound-mail.sendgrid.net [149.72.126.143]) by nue.mailmanlists.eu (Postfix) with ESMTPS id 774138444F for ; Wed, 24 Apr 2024 15:57:12 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=pass (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=Skao4FLD; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:from:subject:to; s=s1; bh=Fq9mAilnuSnGsEEJUeLbbnBFH0zdQQm6I86QeN3eBzE=; b=Skao4FLDn4UsjzBvR5s2gBkmtOs0nLa/06K6IQLP3omxer6G+DwYi49jAvRWLN+IvU8q Z2Fw6EifHyU8NqyuuZVX5RoQgnmnAHbSPCvqcqAnMFAxEVj7c4dwcM3AAMrdfYd/tJWREr hZFOEwtM6qz7zs5o2XWgw6uydAHuc3sFdCYKCdCArfMBEMje/qeF51FaMoJY+o6FxPanTQ t3x7lWA7UQB/m4D6dW4SWW1vP/693+WG4OsROfko/xi+Smy+Mjf2NzJbMuZ6TI0SIj/dRx dXV8eQ0UVape05mVLgFZkW8QZpf2S8mZPAymlrOQi8EwjkwOuYMs5xxuXchFFrPA== Received: by recvd-6b888cd74b-k8brp with SMTP id recvd-6b888cd74b-k8brp-1-66292BD6-1B 2024-04-24 15:57:10.930082654 +0000 UTC m=+1015040.079817277 Received: from herokuapp.com (unknown) by geopod-ismtpd-32 (SG) with ESMTP id hpffXFZyRwCZ4X7ZDR0TKg for ; Wed, 24 Apr 2024 15:57:11.228 +0000 (UTC) Date: Wed, 24 Apr 2024 15:57:11 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Feature X-Redmine-Issue-Id: 20448 X-Redmine-Issue-Author: ms-tob X-Redmine-Issue-Priority: Normal X-Redmine-Sender: mame X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 94285 X-SG-EID: =?us-ascii?Q?u001=2Ep+ckLDtT+4Y5c+H0YCkEnsuWiCQmn3OZA=2F9FzjoR6ZZlPaMv54M7EFoSM?= =?us-ascii?Q?CX5Trc79ep2R5F+0oYS4n23jq1cvwn023TyKIaC?= =?us-ascii?Q?Kpq=2F6mYppJXSSEq1AsXLn+zSCZfgFFZM1QY98R9?= =?us-ascii?Q?WZnV6=2F+1WQFXT+MZybW+O9ej1H6=2F8K2Gt5lCbSr?= =?us-ascii?Q?axiQa0bBMzwIof7ZT9jozBNX8J7bbVTrniAGLTI?= =?us-ascii?Q?a1PInx3bh+i=2Fkqa9PjGA6ZogZ8raNHi7joX1ZzH?= =?us-ascii?Q?sPjo?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: u001.I8uzylDtAfgbeCOeLBYDww== Message-ID-Hash: R5UN2PK6XZS45CJ4BTKBT5BXFZBLOBZK X-Message-ID-Hash: R5UN2PK6XZS45CJ4BTKBT5BXFZBLOBZK X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:117689] [Ruby master Feature#20448] Make coverage event hooking C API public List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "mame (Yusuke Endoh) via ruby-core" Cc: "mame (Yusuke Endoh)" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Issue #20448 has been updated by mame (Yusuke Endoh). Oh, this API does not allow to get information of branch_from and branch_to for branches that are never fired. It is insufficient to reimplement coverage.so on TracePoint. We have to actually create a patch and identify the appropriate API. ---------------------------------------- Feature #20448: Make coverage event hooking C API public https://bugs.ruby-lang.org/issues/20448#change-108098 * Author: ms-tob (Matt S) * Status: Open ---------------------------------------- # Abstract Gathering code coverage information is a well-known goal within software engineering. It is most commonly used to assess code coverage during automated testing. A lesser known use-case is coverage-guided fuzz testing, which will be the primary use-case presented in this issue. This issue exists to request that Ruby coverage event hooking be made part of its official, public C API. # Background Ruby currently provides a number of avenues for hooking events *or* gathering coverage information: 1. The [Coverage](https://ruby-doc.org/3.3.0/exts/coverage/Coverage.html) module 2. The [TracePoint](https://ruby-doc.org/3.3.0/TracePoint.html) module 3. The [rb_add_event_hook](https://ruby-doc.org/3.3.0/extension_rdoc.html#label-Hooks+for+the+interpreter+events) extension function Unfortunately, none of these pieces of functionality solve this issue's specific use-case. The `Coverage` module is not a great fit for real-time coverage analysis with an unknown start and stop point. Coverage-guided fuzz testing requires this. The `TracePoint` module and `rb_add_event_hook` are not able to hook branch and line coverage events. Coverage-guided fuzz testing typically tracks branch events. # Proposal The ultimate goal is to enable Ruby C extensions to process coverage events in real-time. I did some cursory investigation into the Ruby C internals to determine what it would take to achieve this, but I'm by no means an expert, so my list may be incomplete. The good news is that much of this functionality already exists, but it's part of the private, internal-only C API. 1. Make `RUBY_EVENT_COVERAGE_LINE` and `RUBY_EVENT_COVERAGE_BRANCH` public: https://github.com/ruby/ruby/blob/v3_3_0/vm_core.h#L2182-L2184 a. This would be an addition to the current public event types: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/internal/event.h#L32-L46 2. Allow initializing global coverage state so that coverage tracking can be fully enabled a. Currently, if `Coverage.setup` or `Coverage.start` is not called, then coverage events cannot be hooked. I do not fully understand why this is, but I believe it has something to do with `rb_get_coverages` and `rb_set_coverages`. If calls to `rb_get_coverages` return `NULL` (https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L641-L647, https://github.com/ruby/ruby/blob/v3_3_0/iseq.c#L864-L868), then coverage hooking will not be enabled. I believe the `Coverage` module initializes that state via a `rb_set_coverages` call here: https://github.com/ruby/ruby/blob/v3_3_0/ext/coverage/coverage.c#L112-L120. b. So, to achieve this goal, a C extension would need to be able to call `rb_set_coverages` or somehow initialize the global coverage state. I've actually been able to achieve this functionality by calling undocumented features and defining `RUBY_EVENT_COVERAGE_BRANCH`: ```c #include #include #define RUBY_EVENT_COVERAGE_BRANCH 0x020000 // ... rb_event_flag_t events = RUBY_EVENT_COVERAGE_BRANCH; rb_event_hook_flag_t flags = ( RUBY_EVENT_HOOK_FLAG_SAFE | RUBY_EVENT_HOOK_FLAG_RAW_ARG ); rb_add_event_hook2( (rb_event_hook_func_t) event_hook_branch, events, counter_hash, flags ); ``` If I call `Coverage.setup(branches: true)`, and add this event hook, then branch hooking works as expected. `rb_add_event_hook2` will still respect the `RUBY_EVENT_COVERAGE_BRANCH` value if its passed. But it would be better if I could rely on official functionality rather than undocumented features. The above two points would be requirements for this functionality, but there's an additional nice-to-have: 3. Extend the public `tracearg` functionality to include additional coverage information a. Currently, `tracearg` offers information like `rb_tracearg_lineno` and `rb_tracearg_path`. It would be helpful if it also provided additional coverage information like `coverage.c`'s column information and a unique identifier for each branch. Currently, I can only use `(path, lineno)` as a unique identifier for a branch because that's what's offered by the public API, but more information like column number would be helpful for uniquely identify branches. Since there can be multiple `if` statements on a single line, this can provide ambiguous identification for a branch event. # Use cases This use-case was born out of a new coverage-guided Ruby fuzzer: https://github.com/trailofbits/ruzzy. You can read more about its implementation details here: https://blog.trailofbits.com/2024/03/29/introducing-ruzzy-a-coverage-guided-ruby-fuzzer/. You can also find the Ruby C extension code behind its implementation here: https://github.com/trailofbits/ruzzy/blob/v0.7.0/ext/cruzzy/cruzzy.c#L220-L231. So, the primary use-case here is enabling real-time, coverage-guided fuzz testing of Ruby code. However, as mentioned in the abstract, gathering code coverage information is useful in many domains. For example, it could enable new workflows in standard unit/integration test coverage. It could also enable gathering coverage information in real-time as an application is running. I see this as the most generalized form of gathering code coverage information, and something like the `Coverage` module as a specialized implementation. Another example, https://bugs.ruby-lang.org/issues/20282 may be solved by this more generalized solution. We are tracking this request downstream here: https://github.com/trailofbits/ruzzy/issues/9 # Discussion Fuzz testing is another tool in a testers toolbelt. It is an increasingly common way to improve software's robustness. Go has it built in to the standard library, Python has Atheris, Java has Jazzer, JavaScript has Jazzer.js, etc. OSS-Fuzz has helped identify and fix over 10,000 vulnerabilities and 36,000 bugs [using fuzzing](https://google.github.io/oss-fuzz/#trophies). Ruby deserves a good fuzzer, and improving coverage gathering would help achieve that goal. The `Coverage` module, `TracePoint` module, and `rb_add_event_hook` function seem like they could fulfill this goal. However, after deeper investigation, none of them fit the exact requirements for this use-case. # See also - https://bugs.ruby-lang.org/issues/20282 - https://github.com/google/atheris - https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html - https://github.com/CodeIntelligenceTesting/jazzer/ - https://www.code-intelligence.com/blog/java-fuzzing-with-jazzer - https://go.dev/doc/security/fuzz/ -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/