ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:109977] [Ruby master Feature#19013] Error Tolerant Parser
@ 2022-09-21 12:28 yui-knk (Kaneko Yuichiro)
  2022-09-22  3:04 ` [ruby-core:109984] " duerst
  2022-09-22 21:10 ` [ruby-core:110003] " matz (Yukihiro Matsumoto)
  0 siblings, 2 replies; 3+ messages in thread
From: yui-knk (Kaneko Yuichiro) @ 2022-09-21 12:28 UTC (permalink / raw)
  To: ruby-core

Issue #19013 has been reported by yui-knk (Kaneko Yuichiro).

----------------------------------------
Feature #19013: Error Tolerant Parser
https://bugs.ruby-lang.org/issues/19013

* Author: yui-knk (Kaneko Yuichiro)
* Status: Open
* Priority: Normal
----------------------------------------
# Background

Implementation for Language Server Protocol (LSP) sometimes needs to parse incomplete ruby script for example users want to complement expressions in the middle of statement like below:

```ruby
class A
  def m
    a = 10
    if # here users want to run completion
  end
end
```

In such case, LSP implementation wants to get partial AST instead of syntax error.

# Proposal

At the moment I want to propose 3 types of tolerance

## 1. Complement `end` when lexer hits to end-of-input but `end` is not enough

This is a case. Lexer will generate 1 `end` before generates end-of-input.

```ruby
describe "1" do
  describe "2" do
    describe "3" do
      it "here" do
    end
  end
end
```

## 2. Extract "end" as keyword not identifier based on an indent

This is a case. Normal parser recognizes "end" on line 4 as "local variable or method".
This causes not only syntax error but also `bar` method definition is assumed as `Z::Foo#bar`.
Other approach is suppress `!IS_lex_state(EXPR_DOT)` checks for "end".

```ruby
module Z
  class Foo
    foo.
  end

  def bar
  end
end
```

## 3. Change locations of `error`

Currently `error` is put into `top_stmts` and `stmts` like `top_stmts: error top_stmt` and `stmts: error stmt`.
However these are too strict to catch syntax error then want to move it to `stmt: error` and `expr_value: error`.

# Interface

* Adding `error_tolerant` option to `RubyVM::AbstractSyntaxTree.parse`
* Adding `--error-tolerant-parser` option to ruby command for debugging
  * This option is valid only when `–dump=yydebug`, `--dump=parsetree` or `--dump=parsetree_with_comment` is passed

# Compatibility

Changing the location of `error` can lead incompatibility. At least I observed 2 test cases in ruby/ruby are broken by this change.
I think both of them depend on how ripper behaves after ripper raises syntax error.

* RDoc: https://github.com/yui-knk/ruby/commit/1dabbe508f0cc3dd4f83aa72502bbf347029dd8c
  * However ruby script in heredoc is invalid...
* irb: https://github.com/yui-knk/ruby/commit/e18be19ecd044eb26a56f6f9ba4f19d40c01a9c7
  * Range of error coloring is changed

All other changes are related to not parser but lexer and they are controlled by `error_tolerant` option. Therefore no behavior change is expected for ruby parser and ripper.

# Implementation

https://github.com/yui-knk/ruby/tree/error_recovery_indent_aware




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ruby-core:109984] [Ruby master Feature#19013] Error Tolerant Parser
  2022-09-21 12:28 [ruby-core:109977] [Ruby master Feature#19013] Error Tolerant Parser yui-knk (Kaneko Yuichiro)
@ 2022-09-22  3:04 ` duerst
  2022-09-22 21:10 ` [ruby-core:110003] " matz (Yukihiro Matsumoto)
  1 sibling, 0 replies; 3+ messages in thread
From: duerst @ 2022-09-22  3:04 UTC (permalink / raw)
  To: ruby-core

Issue #19013 has been updated by duerst (Martin Dürst).


The topic of parsing incomplete syntax also came up in Kevin Newton's talk (see https://rubykaigi.org/2022/presentations/kddnewton.html) at RubyKaigi 2022. In the talk, he said he is working on a new parser. Maybe these efforts could be combined?

----------------------------------------
Feature #19013: Error Tolerant Parser
https://bugs.ruby-lang.org/issues/19013#change-99233

* Author: yui-knk (Kaneko Yuichiro)
* Status: Open
* Priority: Normal
----------------------------------------
# Background

Implementation for Language Server Protocol (LSP) sometimes needs to parse incomplete ruby script for example users want to complement expressions in the middle of statement like below:

```ruby
class A
  def m
    a = 10
    if # here users want to run completion
  end
end
```

In such case, LSP implementation wants to get partial AST instead of syntax error.

# Proposal

At the moment I want to propose 3 types of tolerance

## 1. Complement `end` when lexer hits to end-of-input but `end` is not enough

This is a case. Lexer will generate 1 `end` before generates end-of-input.

```ruby
describe "1" do
  describe "2" do
    describe "3" do
      it "here" do
    end
  end
end
```

## 2. Extract "end" as keyword not identifier based on an indent

This is a case. Normal parser recognizes "end" on line 4 as "local variable or method".
This causes not only syntax error but also `bar` method definition is assumed as `Z::Foo#bar`.
Other approach is suppress `!IS_lex_state(EXPR_DOT)` checks for "end".

```ruby
module Z
  class Foo
    foo.
  end

  def bar
  end
end
```

## 3. Change locations of `error`

Currently `error` is put into `top_stmts` and `stmts` like `top_stmts: error top_stmt` and `stmts: error stmt`.
However these are too strict to catch syntax error then want to move it to `stmt: error` and `expr_value: error`.

# Interface

* Adding `error_tolerant` option to `RubyVM::AbstractSyntaxTree.parse`
* Adding `--error-tolerant-parser` option to ruby command for debugging
  * This option is valid only when `–dump=yydebug`, `--dump=parsetree` or `--dump=parsetree_with_comment` is passed

# Compatibility

Changing the location of `error` can lead incompatibility. At least I observed 2 test cases in ruby/ruby are broken by this change.
I think both of them depend on how ripper behaves after ripper raises syntax error.

* RDoc: https://github.com/yui-knk/ruby/commit/1dabbe508f0cc3dd4f83aa72502bbf347029dd8c
  * However ruby script in heredoc is invalid...
* irb: https://github.com/yui-knk/ruby/commit/e18be19ecd044eb26a56f6f9ba4f19d40c01a9c7
  * Range of error coloring is changed

All other changes are related to not parser but lexer and they are controlled by `error_tolerant` option. Therefore no behavior change is expected for ruby parser and ripper.

# Implementation

https://github.com/yui-knk/ruby/tree/error_recovery_indent_aware




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ruby-core:110003] [Ruby master Feature#19013] Error Tolerant Parser
  2022-09-21 12:28 [ruby-core:109977] [Ruby master Feature#19013] Error Tolerant Parser yui-knk (Kaneko Yuichiro)
  2022-09-22  3:04 ` [ruby-core:109984] " duerst
@ 2022-09-22 21:10 ` matz (Yukihiro Matsumoto)
  1 sibling, 0 replies; 3+ messages in thread
From: matz (Yukihiro Matsumoto) @ 2022-09-22 21:10 UTC (permalink / raw)
  To: ruby-core

Issue #19013 has been updated by matz (Yukihiro Matsumoto).


Kevin's work has broader goals, e.g. being faster, consuming less memory, which should be free from yacc/bison limitation.
I consider this work as an experiment to explore error-tolerant-ness.

Matz.



----------------------------------------
Feature #19013: Error Tolerant Parser
https://bugs.ruby-lang.org/issues/19013#change-99254

* Author: yui-knk (Kaneko Yuichiro)
* Status: Open
* Priority: Normal
----------------------------------------
# Background

Implementation for Language Server Protocol (LSP) sometimes needs to parse incomplete ruby script for example users want to complement expressions in the middle of statement like below:

```ruby
class A
  def m
    a = 10
    if # here users want to run completion
  end
end
```

In such case, LSP implementation wants to get partial AST instead of syntax error.

# Proposal

At the moment I want to propose 3 types of tolerance

## 1. Complement `end` when lexer hits to end-of-input but `end` is not enough

This is a case. Lexer will generate 1 `end` before generates end-of-input.

```ruby
describe "1" do
  describe "2" do
    describe "3" do
      it "here" do
    end
  end
end
```

## 2. Extract "end" as keyword not identifier based on an indent

This is a case. Normal parser recognizes "end" on line 4 as "local variable or method".
This causes not only syntax error but also `bar` method definition is assumed as `Z::Foo#bar`.
Other approach is suppress `!IS_lex_state(EXPR_DOT)` checks for "end".

```ruby
module Z
  class Foo
    foo.
  end

  def bar
  end
end
```

## 3. Change locations of `error`

Currently `error` is put into `top_stmts` and `stmts` like `top_stmts: error top_stmt` and `stmts: error stmt`.
However these are too strict to catch syntax error then want to move it to `stmt: error` and `expr_value: error`.

# Interface

* Adding `error_tolerant` option to `RubyVM::AbstractSyntaxTree.parse`
* Adding `--error-tolerant-parser` option to ruby command for debugging
  * This option is valid only when `–dump=yydebug`, `--dump=parsetree` or `--dump=parsetree_with_comment` is passed

# Compatibility

Changing the location of `error` can lead incompatibility. At least I observed 2 test cases in ruby/ruby are broken by this change.
I think both of them depend on how ripper behaves after ripper raises syntax error.

* RDoc: https://github.com/yui-knk/ruby/commit/1dabbe508f0cc3dd4f83aa72502bbf347029dd8c
  * However ruby script in heredoc is invalid...
* irb: https://github.com/yui-knk/ruby/commit/e18be19ecd044eb26a56f6f9ba4f19d40c01a9c7
  * Range of error coloring is changed

All other changes are related to not parser but lexer and they are controlled by `error_tolerant` option. Therefore no behavior change is expected for ruby parser and ripper.

# Implementation

https://github.com/yui-knk/ruby/tree/error_recovery_indent_aware




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-22 21:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21 12:28 [ruby-core:109977] [Ruby master Feature#19013] Error Tolerant Parser yui-knk (Kaneko Yuichiro)
2022-09-22  3:04 ` [ruby-core:109984] " duerst
2022-09-22 21:10 ` [ruby-core:110003] " matz (Yukihiro Matsumoto)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).