* [ruby-core:109977] [Ruby master Feature#19013] Error Tolerant Parser
@ 2022-09-21 12:28 yui-knk (Kaneko Yuichiro)
2022-09-22 3:04 ` [ruby-core:109984] " duerst
2022-09-22 21:10 ` [ruby-core:110003] " matz (Yukihiro Matsumoto)
0 siblings, 2 replies; 3+ messages in thread
From: yui-knk (Kaneko Yuichiro) @ 2022-09-21 12:28 UTC (permalink / raw)
To: ruby-core
Issue #19013 has been reported by yui-knk (Kaneko Yuichiro).
----------------------------------------
Feature #19013: Error Tolerant Parser
https://bugs.ruby-lang.org/issues/19013
* Author: yui-knk (Kaneko Yuichiro)
* Status: Open
* Priority: Normal
----------------------------------------
# Background
Implementation for Language Server Protocol (LSP) sometimes needs to parse incomplete ruby script for example users want to complement expressions in the middle of statement like below:
```ruby
class A
def m
a = 10
if # here users want to run completion
end
end
```
In such case, LSP implementation wants to get partial AST instead of syntax error.
# Proposal
At the moment I want to propose 3 types of tolerance
## 1. Complement `end` when lexer hits to end-of-input but `end` is not enough
This is a case. Lexer will generate 1 `end` before generates end-of-input.
```ruby
describe "1" do
describe "2" do
describe "3" do
it "here" do
end
end
end
```
## 2. Extract "end" as keyword not identifier based on an indent
This is a case. Normal parser recognizes "end" on line 4 as "local variable or method".
This causes not only syntax error but also `bar` method definition is assumed as `Z::Foo#bar`.
Other approach is suppress `!IS_lex_state(EXPR_DOT)` checks for "end".
```ruby
module Z
class Foo
foo.
end
def bar
end
end
```
## 3. Change locations of `error`
Currently `error` is put into `top_stmts` and `stmts` like `top_stmts: error top_stmt` and `stmts: error stmt`.
However these are too strict to catch syntax error then want to move it to `stmt: error` and `expr_value: error`.
# Interface
* Adding `error_tolerant` option to `RubyVM::AbstractSyntaxTree.parse`
* Adding `--error-tolerant-parser` option to ruby command for debugging
* This option is valid only when `–dump=yydebug`, `--dump=parsetree` or `--dump=parsetree_with_comment` is passed
# Compatibility
Changing the location of `error` can lead incompatibility. At least I observed 2 test cases in ruby/ruby are broken by this change.
I think both of them depend on how ripper behaves after ripper raises syntax error.
* RDoc: https://github.com/yui-knk/ruby/commit/1dabbe508f0cc3dd4f83aa72502bbf347029dd8c
* However ruby script in heredoc is invalid...
* irb: https://github.com/yui-knk/ruby/commit/e18be19ecd044eb26a56f6f9ba4f19d40c01a9c7
* Range of error coloring is changed
All other changes are related to not parser but lexer and they are controlled by `error_tolerant` option. Therefore no behavior change is expected for ruby parser and ripper.
# Implementation
https://github.com/yui-knk/ruby/tree/error_recovery_indent_aware
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* [ruby-core:109984] [Ruby master Feature#19013] Error Tolerant Parser
2022-09-21 12:28 [ruby-core:109977] [Ruby master Feature#19013] Error Tolerant Parser yui-knk (Kaneko Yuichiro)
@ 2022-09-22 3:04 ` duerst
2022-09-22 21:10 ` [ruby-core:110003] " matz (Yukihiro Matsumoto)
1 sibling, 0 replies; 3+ messages in thread
From: duerst @ 2022-09-22 3:04 UTC (permalink / raw)
To: ruby-core
Issue #19013 has been updated by duerst (Martin Dürst).
The topic of parsing incomplete syntax also came up in Kevin Newton's talk (see https://rubykaigi.org/2022/presentations/kddnewton.html) at RubyKaigi 2022. In the talk, he said he is working on a new parser. Maybe these efforts could be combined?
----------------------------------------
Feature #19013: Error Tolerant Parser
https://bugs.ruby-lang.org/issues/19013#change-99233
* Author: yui-knk (Kaneko Yuichiro)
* Status: Open
* Priority: Normal
----------------------------------------
# Background
Implementation for Language Server Protocol (LSP) sometimes needs to parse incomplete ruby script for example users want to complement expressions in the middle of statement like below:
```ruby
class A
def m
a = 10
if # here users want to run completion
end
end
```
In such case, LSP implementation wants to get partial AST instead of syntax error.
# Proposal
At the moment I want to propose 3 types of tolerance
## 1. Complement `end` when lexer hits to end-of-input but `end` is not enough
This is a case. Lexer will generate 1 `end` before generates end-of-input.
```ruby
describe "1" do
describe "2" do
describe "3" do
it "here" do
end
end
end
```
## 2. Extract "end" as keyword not identifier based on an indent
This is a case. Normal parser recognizes "end" on line 4 as "local variable or method".
This causes not only syntax error but also `bar` method definition is assumed as `Z::Foo#bar`.
Other approach is suppress `!IS_lex_state(EXPR_DOT)` checks for "end".
```ruby
module Z
class Foo
foo.
end
def bar
end
end
```
## 3. Change locations of `error`
Currently `error` is put into `top_stmts` and `stmts` like `top_stmts: error top_stmt` and `stmts: error stmt`.
However these are too strict to catch syntax error then want to move it to `stmt: error` and `expr_value: error`.
# Interface
* Adding `error_tolerant` option to `RubyVM::AbstractSyntaxTree.parse`
* Adding `--error-tolerant-parser` option to ruby command for debugging
* This option is valid only when `–dump=yydebug`, `--dump=parsetree` or `--dump=parsetree_with_comment` is passed
# Compatibility
Changing the location of `error` can lead incompatibility. At least I observed 2 test cases in ruby/ruby are broken by this change.
I think both of them depend on how ripper behaves after ripper raises syntax error.
* RDoc: https://github.com/yui-knk/ruby/commit/1dabbe508f0cc3dd4f83aa72502bbf347029dd8c
* However ruby script in heredoc is invalid...
* irb: https://github.com/yui-knk/ruby/commit/e18be19ecd044eb26a56f6f9ba4f19d40c01a9c7
* Range of error coloring is changed
All other changes are related to not parser but lexer and they are controlled by `error_tolerant` option. Therefore no behavior change is expected for ruby parser and ripper.
# Implementation
https://github.com/yui-knk/ruby/tree/error_recovery_indent_aware
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* [ruby-core:110003] [Ruby master Feature#19013] Error Tolerant Parser
2022-09-21 12:28 [ruby-core:109977] [Ruby master Feature#19013] Error Tolerant Parser yui-knk (Kaneko Yuichiro)
2022-09-22 3:04 ` [ruby-core:109984] " duerst
@ 2022-09-22 21:10 ` matz (Yukihiro Matsumoto)
1 sibling, 0 replies; 3+ messages in thread
From: matz (Yukihiro Matsumoto) @ 2022-09-22 21:10 UTC (permalink / raw)
To: ruby-core
Issue #19013 has been updated by matz (Yukihiro Matsumoto).
Kevin's work has broader goals, e.g. being faster, consuming less memory, which should be free from yacc/bison limitation.
I consider this work as an experiment to explore error-tolerant-ness.
Matz.
----------------------------------------
Feature #19013: Error Tolerant Parser
https://bugs.ruby-lang.org/issues/19013#change-99254
* Author: yui-knk (Kaneko Yuichiro)
* Status: Open
* Priority: Normal
----------------------------------------
# Background
Implementation for Language Server Protocol (LSP) sometimes needs to parse incomplete ruby script for example users want to complement expressions in the middle of statement like below:
```ruby
class A
def m
a = 10
if # here users want to run completion
end
end
```
In such case, LSP implementation wants to get partial AST instead of syntax error.
# Proposal
At the moment I want to propose 3 types of tolerance
## 1. Complement `end` when lexer hits to end-of-input but `end` is not enough
This is a case. Lexer will generate 1 `end` before generates end-of-input.
```ruby
describe "1" do
describe "2" do
describe "3" do
it "here" do
end
end
end
```
## 2. Extract "end" as keyword not identifier based on an indent
This is a case. Normal parser recognizes "end" on line 4 as "local variable or method".
This causes not only syntax error but also `bar` method definition is assumed as `Z::Foo#bar`.
Other approach is suppress `!IS_lex_state(EXPR_DOT)` checks for "end".
```ruby
module Z
class Foo
foo.
end
def bar
end
end
```
## 3. Change locations of `error`
Currently `error` is put into `top_stmts` and `stmts` like `top_stmts: error top_stmt` and `stmts: error stmt`.
However these are too strict to catch syntax error then want to move it to `stmt: error` and `expr_value: error`.
# Interface
* Adding `error_tolerant` option to `RubyVM::AbstractSyntaxTree.parse`
* Adding `--error-tolerant-parser` option to ruby command for debugging
* This option is valid only when `–dump=yydebug`, `--dump=parsetree` or `--dump=parsetree_with_comment` is passed
# Compatibility
Changing the location of `error` can lead incompatibility. At least I observed 2 test cases in ruby/ruby are broken by this change.
I think both of them depend on how ripper behaves after ripper raises syntax error.
* RDoc: https://github.com/yui-knk/ruby/commit/1dabbe508f0cc3dd4f83aa72502bbf347029dd8c
* However ruby script in heredoc is invalid...
* irb: https://github.com/yui-knk/ruby/commit/e18be19ecd044eb26a56f6f9ba4f19d40c01a9c7
* Range of error coloring is changed
All other changes are related to not parser but lexer and they are controlled by `error_tolerant` option. Therefore no behavior change is expected for ruby parser and ripper.
# Implementation
https://github.com/yui-knk/ruby/tree/error_recovery_indent_aware
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-09-22 21:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21 12:28 [ruby-core:109977] [Ruby master Feature#19013] Error Tolerant Parser yui-knk (Kaneko Yuichiro)
2022-09-22 3:04 ` [ruby-core:109984] " duerst
2022-09-22 21:10 ` [ruby-core:110003] " matz (Yukihiro Matsumoto)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).