ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "ntl (Nathan Ladd) via ruby-core" <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: "ntl (Nathan Ladd)" <noreply@ruby-lang.org>
Subject: [ruby-core:117735] [Ruby master Feature#18583] Pattern-matching: API for custom unpacking strategies?
Date: Sun, 28 Apr 2024 16:11:55 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-108142.20240428161155.710@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-18583.20220212202253.710@ruby-lang.org

Issue #18583 has been updated by ntl (Nathan Ladd).


Could the match operator, `=~`, could be used as a general complement to `===`?

Example (following Victor's original sketch):

``` ruby
class Matcher
  def initialize(regexp)
    @regexp = regexp
  end

  def ===(obj)
    @regexp.match?(obj)
  end

  def =~(obj)
    match_data = @regexp.match(obj)
    match_data
  end
end

case "some string"
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data
  some_named_capture = match_data[:some_named_capture]
  puts "Match: #{some_named_capture}"
end
```

This would add `=~` to the pattern matching protocol that's currently comprised of `===`, `deconstruct` and `deconstruct_keys`. It would make `===` significantly more useful, and regular expressions provide a great example of why: when matching a string to a regular expression pattern, the string is already in lexical scope, but the match data is novel and only comes into existence upon a successful match:

```
subject = "some string"

case subject
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data
  # Capturing the match data variable instead of the original string doesn't make the original string inaccessible: 
  puts "Match subject: #{subject.inspect}"
end
```

I also suspect this could be embedded into the pattern syntax itself, and could allow for some interesting possibilities. One example that leaps to mind is reifying primitive data parsed from JSON into a data structure:

``` ruby
SomeStruct = Struct.new(:some_attr, :some_other_attr) do
  def self.===(data)
    data.is_a?(Hash) && data.key?(:some_attr) && data.key?(:some_other_attr)
  end

  def self.=~(data)
    new(**data)
  end
end

# Parse JSON into raw (primitive) data
some_data = JSON.parse(<<JSON)
{
  "some_attr": "some value",
  "some_other_attr": "some other value"
}
JSON

# Reify data structure from raw data
case some_data
in SomeStruct => some_struct
  puts some_struct.inspect
end
```

----------------------------------------
Feature #18583: Pattern-matching: API for custom unpacking strategies?
https://bugs.ruby-lang.org/issues/18583#change-108142

* Author: zverok (Victor Shepelev)
* Status: Open
----------------------------------------
I started to think about it when discussing https://github.com/ruby/strscan/pull/30. 
The thing is, usage of StringScanner for many complicated parsers invokes some kind of branching.

In pseudocode, the "ideal API" would allow to write something like this:
```ruby
case <what next matches>
in /regexp1/ => value_that_matched
  # use value_that_matched
in /regexp2/ => value_that_matched
  # use value_that_matched
# ...
```
This seems "intuitively" that there *should* be some way of implementing it, but we fall short. We can do some StringScanner-specific matcher object which defines its own `#===` and use it with pinning:
```ruby
case scanner
in ^(Matcher.new(/regexp1/)) => value_that_matched
# ...
```
But there is no API to tell how the match result will be unpacked, just the whole `StringScanner` will be put into `value_that_matched`.

So, I thought that maybe it would be possible to define some kind of API for pattern-like objects, the method with signature like `try_match_pattern(value)`, which by default is implemented like `return value if self === value`, but can be redefined to return something different, like part of the object, or object transformed somehow.

This will open some interesting (if maybe uncanny) possibilities: not just slicing out the necessary part, but something like
```ruby
value => ^(type_caster(Integer)) => int_value
```

So... Just a discussion topic!



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

      parent reply	other threads:[~2024-04-28 16:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-12 20:22 [ruby-core:107564] [Ruby master Feature#18583] Pattern-matching: API for custom unpacking strategies? zverok (Victor Shepelev)
2022-02-12 20:35 ` [ruby-core:107565] " zverok (Victor Shepelev)
2022-02-27  3:24 ` [ruby-core:107752] " hmdne (hmdne -)
2022-03-17 13:10 ` [ruby-core:107950] " palkan (Vladimir Dementyev)
2024-04-28 16:11 ` ntl (Nathan Ladd) via ruby-core [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-108142.20240428161155.710@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=noreply@ruby-lang.org \
    --cc=ruby-core@ml.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).