ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "gareth (Gareth Adams) via ruby-core" <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: "gareth (Gareth Adams)" <noreply@ruby-lang.org>
Subject: [ruby-core:111448] [Ruby master Bug#19266] URI::Generic should use URI::RFC3986_PARSER instead of URI::DEFAULT_PARSER
Date: Mon, 26 Dec 2022 18:52:36 +0000 (UTC)	[thread overview]
Message-ID: <redmine.issue-19266.20221226185236.25391@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-19266.20221226185236.25391@ruby-lang.org

Issue #19266 has been reported by gareth (Gareth Adams).

----------------------------------------
Bug #19266: URI::Generic should use URI::RFC3986_PARSER instead of URI::DEFAULT_PARSER
https://bugs.ruby-lang.org/issues/19266

* Author: gareth (Gareth Adams)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [arm64-darwin21]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
In June 2014, [`uri/common` was updated][1] to introduce a RFC3986-compliant parser (`URI::RFC3986_PARSER`) as an alternative to the previous RFC2396 parser, and common methods like `URI()` were updated to use that new parser by default. The only methods in `common` not updated were [`URI.extract` and `URI.regexp`][2] which are marked as obsolete. (The old parser was kept in the `DEFAULT_PARSER` constant despite it not being the default for those methods, presumably for backward compatibility.)

However, similar [methods called on `URI::Generic`][3] were never updated to use this new parser. This means that methods like `URI::Generic.build` fail when given input that succeeds normally, and this also affects subclasses like URI::HTTP:

```
$ pry -r uri -r uri/common -r uri/generic

[1] pry(main)> URI::Generic.build(host: "underscore_host.example")
URI::InvalidComponentError: bad component(expected host component): underscore_host.example
from /Users/gareth/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/uri/generic.rb:591:in `check_host'

[2] pry(main)> URI::HTTP.build(host: "underscore_host.example")
URI::InvalidComponentError: bad component(expected host component): underscore_host.example
from /Users/gareth/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/uri/generic.rb:591:in `check_host'

[3] pry(main)> URI("http://underscore_host.example")
=> #<URI::HTTP http://underscore_host.example>
```

`URI::Generic.new` allows a configurable `parser` positional argument to override the class' default parser, but other factory methods like `.build` don't allow this override.

Arguably this doesn't cause problems because at least in the case above, the URI can be built with the polymorphic constructor, but having the option to build URIs from explicit named parts is useful, and leaving the outdated functionality in the `Generic` class is ambiguous. It's possible that the whole Generic class and its subclasses aren't intended to be used directly how I'm intending here, but there's nothing I could see that suggested this is the case.

I'm not aware of the entire list of differences between RFC2396 and RFC3986. The relevant difference here is that in RFC2396 an individual segment of a host ([`domainlabel`s][4]) could only be `alphanum | alphanum *( alphanum | "-" ) alphanum`, whereas RFC3986 allows [hostnames][5] to include any of `ALPHA / DIGIT / "-" / "." / "_" / "~"`. It's possible that other differences might cause issues for developers, but since this has gone over 8 years without anyone else caring about this, this is definitely not especially urgent.

[1]: https://github.com/ruby/ruby/commit/bb83f32dc3e0424d25fa4e55d8ff32b061320e41
[2]: https://github.com/ruby/ruby/blob/28a17436503c3c4cb7a35b423a894b697cd80da9/lib/uri/common.rb#L233-L297
[3]: https://github.com/ruby/ruby/blob/28a17436503c3c4cb7a35b423a894b697cd80da9/lib/uri/generic.rb#L169-L175
[4]: https://www.rfc-editor.org/rfc/rfc2396#section-3.2.2
[5]: https://www.rfc-editor.org/rfc/rfc3986#page-13



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

       reply	other threads:[~2022-12-26 18:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-26 18:52 gareth (Gareth Adams) via ruby-core [this message]
2023-01-10  5:27 ` [ruby-core:111767] [Ruby master Bug#19266] URI::Generic should use URI::RFC3986_PARSER instead of URI::DEFAULT_PARSER gareth (Gareth Adams) via ruby-core
2023-01-16 17:10 ` [ruby-core:111839] " gareth (Gareth Adams) via ruby-core
2024-04-09 19:34 ` [ruby-core:117480] " jeremyevans0 (Jeremy Evans) via ruby-core

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.issue-19266.20221226185236.25391@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=noreply@ruby-lang.org \
    --cc=ruby-core@ml.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).