* Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? @ 2009-11-24 18:22 Iñaki Baz Castillo 2009-11-30 16:27 ` Iñaki Baz Castillo 0 siblings, 1 reply; 9+ messages in thread From: Iñaki Baz Castillo @ 2009-11-24 18:22 UTC (permalink / raw) To: Rack Development Hi, I've reported an issue for Thin HTTP server since it cuts PATH_INFO after finding a ";": http://github.com/macournoyer/thin/issues/closed/#issue/6 --------------------------- If Thin receives a request with request line: GET /level1/level2;user=alice/level3/doc.txt HTTP/1.1 then Thin cuts the URI path and leaves: env{"PATH_INFO"} => "/level1/level2" Why? I've looked for such behavior in RFC 3986 and RFC 2616 but I don't find a reason to remove all the path content after ";". --------------------------- A Thin developer closed the bug with the argument: ---------------------------- the env object is constructed according to Rack specs: http://rack.rubyforge.org/doc/SPEC.html If you want the full URI look into env["REQUEST_URI"] ---------------------------- However I don't find in the SPEC where the PATH_INFO must be cut before the first semicolon char. I expect that is not correct at all since if ";" appears in the PATH (I mean before the query) then it's a valid char for the PATH. In fact this issue is breaking my application since Rack receives requests containing ";" in the path like: GET /app/user%25b@id=%22ibc@domain.org;nat=yes%22%2d which after hex-unescaping becomes: GET /app/user[@id="ibc@domain.org;nat=yes"] Thin (and Mongrel) creates PATH_INFO as follows: /app/user%25b@id=%22ibc@domain.org A workaround is forcing the client to escape the semicolon, but this violates the application specs as ";" is not required to be encoded in the HTTP uri (it could be encoded or not, but it's not required). IMHO it's a bug in Rack SPEC (even if I don't find that point) or in Thin/Mongrel implementation. Thanks a lot for any comment. -- Iñaki Baz Castillo <ibc@aliax.net> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? 2009-11-24 18:22 Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? Iñaki Baz Castillo @ 2009-11-30 16:27 ` Iñaki Baz Castillo 2009-11-30 23:31 ` Hongli Lai [not found] ` <8108c6090912061939h7f2ae70rac675219ed08f2e8@mail.gmail.com> 0 siblings, 2 replies; 9+ messages in thread From: Iñaki Baz Castillo @ 2009-11-30 16:27 UTC (permalink / raw) To: Rack Development Hi, could I get some reply to this mail please? I strongly think it could be a bug in Rack so I can't understand why it is ignored. In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an allowed character into the URI path sections. Regards. El Martes, 24 de Noviembre de 2009, Iñaki Baz Castillo escribió: > Hi, I've reported an issue for Thin HTTP server since it cuts PATH_INFO > after finding a ";": > > http://github.com/macournoyer/thin/issues/closed/#issue/6 > > --------------------------- > If Thin receives a request with request line: > > GET /level1/level2;user=alice/level3/doc.txt HTTP/1.1 > > then Thin cuts the URI path and leaves: > env{"PATH_INFO"} => "/level1/level2" > > Why? I've looked for such behavior in RFC 3986 and RFC 2616 but I don't > find a reason to remove all the path content after ";". > --------------------------- > > > A Thin developer closed the bug with the argument: > > ---------------------------- > the env object is constructed according to Rack specs: > http://rack.rubyforge.org/doc/SPEC.html > If you want the full URI look into env["REQUEST_URI"] > ---------------------------- > > > However I don't find in the SPEC where the PATH_INFO must be cut before the > first semicolon char. > > I expect that is not correct at all since if ";" appears in the PATH (I > mean before the query) then it's a valid char for the PATH. > > In fact this issue is breaking my application since Rack receives requests > containing ";" in the path like: > > GET /app/user%25b@id=%22ibc@domain.org;nat=yes%22%2d > > which after hex-unescaping becomes: > > GET /app/user[@id="ibc@domain.org;nat=yes"] > > Thin (and Mongrel) creates PATH_INFO as follows: > > /app/user%25b@id=%22ibc@domain.org > > > A workaround is forcing the client to escape the semicolon, but this > violates the application specs as ";" is not required to be encoded in the > HTTP uri (it could be encoded or not, but it's not required). > > > IMHO it's a bug in Rack SPEC (even if I don't find that point) or in > Thin/Mongrel implementation. > > Thanks a lot for any comment. > -- Iñaki Baz Castillo <ibc@aliax.net> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? 2009-11-30 16:27 ` Iñaki Baz Castillo @ 2009-11-30 23:31 ` Hongli Lai 2009-12-01 0:42 ` Iñaki Baz Castillo [not found] ` <8108c6090912061939h7f2ae70rac675219ed08f2e8@mail.gmail.com> 1 sibling, 1 reply; 9+ messages in thread From: Hongli Lai @ 2009-11-30 23:31 UTC (permalink / raw) To: Rack Development On Nov 30, 5:27 pm, Iñaki Baz Castillo <i...@aliax.net> wrote: > Hi, could I get some reply to this mail please? I strongly think it could be a > bug in Rack so I can't understand why it is ignored. > > In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an > allowed character into the URI path sections. As far as I know the Rack spec doesn't mention anything about ";". Neither do the CGI and WSGI specs for that matter: http://hoohoo.ncsa.illinois.edu/cgi/env.html http://www.python.org/dev/peps/pep-0333/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? 2009-11-30 23:31 ` Hongli Lai @ 2009-12-01 0:42 ` Iñaki Baz Castillo 2009-12-05 22:54 ` Scytrin dai Kinthra 0 siblings, 1 reply; 9+ messages in thread From: Iñaki Baz Castillo @ 2009-12-01 0:42 UTC (permalink / raw) To: rack-devel 2009/12/1 Hongli Lai <hongli@phusion.nl>: > On Nov 30, 5:27 pm, Iñaki Baz Castillo <i...@aliax.net> wrote: >> Hi, could I get some reply to this mail please? I strongly think it could be a >> bug in Rack so I can't understand why it is ignored. >> >> In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an >> allowed character into the URI path sections. > > As far as I know the Rack spec doesn't mention anything about ";". > Neither do the CGI and WSGI specs for that matter: > http://hoohoo.ncsa.illinois.edu/cgi/env.html > http://www.python.org/dev/peps/pep-0333/ Yes, as I said I inspected the BNF grammar section for both URI and HTTP specifications, and ";" is an allowed char into an URI segment. But the fact is that Thin and Mongrel cut the PATH_INFO when finding a ";". Unfortunatelly the author of Thin closed the bug: http://github.com/macournoyer/thin/issues/closed/#issue/6 -- Iñaki Baz Castillo <ibc@aliax.net> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? 2009-12-01 0:42 ` Iñaki Baz Castillo @ 2009-12-05 22:54 ` Scytrin dai Kinthra 2009-12-05 23:15 ` Iñaki Baz Castillo 0 siblings, 1 reply; 9+ messages in thread From: Scytrin dai Kinthra @ 2009-12-05 22:54 UTC (permalink / raw) To: rack-devel After going through the various handlers and routing middlewares, I am unaware of any apps that modify the PATH_INFO in the way described. The split of PATH_INFO at ';' seems to occur before the request hits rack. On Mon, Nov 30, 2009 at 16:42, Iñaki Baz Castillo <ibc@aliax.net> wrote: > 2009/12/1 Hongli Lai <hongli@phusion.nl>: >> On Nov 30, 5:27 pm, Iñaki Baz Castillo <i...@aliax.net> wrote: >>> Hi, could I get some reply to this mail please? I strongly think it could be a >>> bug in Rack so I can't understand why it is ignored. >>> >>> In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an >>> allowed character into the URI path sections. >> >> As far as I know the Rack spec doesn't mention anything about ";". >> Neither do the CGI and WSGI specs for that matter: >> http://hoohoo.ncsa.illinois.edu/cgi/env.html >> http://www.python.org/dev/peps/pep-0333/ > > Yes, as I said I inspected the BNF grammar section for both URI and > HTTP specifications, and ";" is an allowed char into an URI segment. > But the fact is that Thin and Mongrel cut the PATH_INFO when finding a ";". > > Unfortunatelly the author of Thin closed the bug: > http://github.com/macournoyer/thin/issues/closed/#issue/6 > > > -- > Iñaki Baz Castillo > <ibc@aliax.net> > -- stadik.net ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? 2009-12-05 22:54 ` Scytrin dai Kinthra @ 2009-12-05 23:15 ` Iñaki Baz Castillo 2009-12-07 2:32 ` Eric Wong 0 siblings, 1 reply; 9+ messages in thread From: Iñaki Baz Castillo @ 2009-12-05 23:15 UTC (permalink / raw) To: rack-devel El Sábado, 5 de Diciembre de 2009, Scytrin dai Kinthra escribió: > After going through the various handlers and routing middlewares, I am > unaware of any apps that modify the PATH_INFO in the way described. > The split of PATH_INFO at ';' seems to occur before the request hits > rack. I'm really sorry as I forgot to update this thread. Thin's developer already confirmed me that this is a bug in Mongrel parser: http://github.com/macournoyer/thin/issues#issue/7 The fact is that using Webrick the issue doesn't happen. It occurs with Mongrel itself and Thin as both uses Mongrel HTTP parser. -- Iñaki Baz Castillo <ibc@aliax.net> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? 2009-12-05 23:15 ` Iñaki Baz Castillo @ 2009-12-07 2:32 ` Eric Wong 2009-12-07 9:53 ` Iñaki Baz Castillo 0 siblings, 1 reply; 9+ messages in thread From: Eric Wong @ 2009-12-07 2:32 UTC (permalink / raw) To: rack-devel; +Cc: Marc-Andre Cournoyer Iñaki Baz Castillo <ibc@aliax.net> wrote: > El Sábado, 5 de Diciembre de 2009, Scytrin dai Kinthra escribió: > > After going through the various handlers and routing middlewares, I am > > unaware of any apps that modify the PATH_INFO in the way described. > > The split of PATH_INFO at ';' seems to occur before the request hits > > rack. > > I'm really sorry as I forgot to update this thread. Thin's developer already > confirmed me that this is a bug in Mongrel parser: > > http://github.com/macournoyer/thin/issues#issue/7 > > The fact is that using Webrick the issue doesn't happen. It occurs with > Mongrel itself and Thin as both uses Mongrel HTTP parser. Hi, This should be a one line fix for Thin, I just pushed out a similar fix for unicorn.git (with tests). From e8dd3e13b9a9f548a3138debd09e87fbb69e3998 Mon Sep 17 00:00:00 2001 From: Eric Wong <normalperson@yhbt.net> Date: Mon, 7 Dec 2009 02:20:18 +0000 Subject: [PATCH] http: PATH_INFO/REQUEST_PATH includes semi-colons This is allowed according to RFC 2396, section 3.3 and matches the behavior of URI.parse, as well. --- ext/unicorn_http/unicorn_http_common.rl | 2 +- lib/unicorn/app/old_rails/static.rb | 6 +--- test/unit/test_http_parser_ng.rb | 46 +++++++++++++++++++++++++++++++ 3 files changed, 48 insertions(+), 6 deletions(-) diff --git a/ext/unicorn_http/unicorn_http_common.rl b/ext/unicorn_http/unicorn_http_common.rl index 5d46087..041dfec 100644 --- a/ext/unicorn_http/unicorn_http_common.rl +++ b/ext/unicorn_http/unicorn_http_common.rl @@ -33,7 +33,7 @@ query = ( uchar | reserved )* %query_string ; param = ( pchar | "/" )* ; params = ( param ( ";" param )* ) ; - rel_path = ( path? %request_path (";" params)? ) ("?" %start_query query)?; + rel_path = (path? (";" params)? %request_path) ("?" %start_query query)?; absolute_path = ( "/"+ rel_path ); path_uri = absolute_path > mark %request_uri; Absolute_URI = (scheme "://" host_with_port path_uri); diff --git a/lib/unicorn/app/old_rails/static.rb b/lib/unicorn/app/old_rails/static.rb index 82f8aa5..13a435e 100644 --- a/lib/unicorn/app/old_rails/static.rb +++ b/lib/unicorn/app/old_rails/static.rb @@ -46,11 +46,7 @@ class Unicorn::App::OldRails::Static < Struct.new(:app, :root, :file_server) end # then try the cached version: - - # grab the semi-colon REST operator used by old versions of Rails - # this is the reason we didn't just copy the new Rails::Rack::Static - env[REQUEST_URI] =~ /^#{Regexp.escape(path_info)}(;[^\?]+)/ - path_info << "#$1#{ActionController::Base.page_cache_extension}" + path_info << ActionController::Base.page_cache_extension if File.file?("#{root}/#{::Rack::Utils.unescape(path_info)}") env[PATH_INFO] = path_info diff --git a/test/unit/test_http_parser_ng.rb b/test/unit/test_http_parser_ng.rb index e84c765..bb61e7f 100644 --- a/test/unit/test_http_parser_ng.rb +++ b/test/unit/test_http_parser_ng.rb @@ -371,4 +371,50 @@ class HttpParserNgTest < Test::Unit::TestCase assert ! parser.headers? end + def test_path_info_semicolon + qs = "QUERY_STRING" + pi = "PATH_INFO" + req = {} + str = "GET %s HTTP/1.1\r\nHost: example.com\r\n\r\n" + { + "/1;a=b?c=d&e=f" => { qs => "c=d&e=f", pi => "/1;a=b" }, + "/1?c=d&e=f" => { qs => "c=d&e=f", pi => "/1" }, + "/1;a=b" => { qs => "", pi => "/1;a=b" }, + "/1;a=b?" => { qs => "", pi => "/1;a=b" }, + "/1?a=b;c=d&e=f" => { qs => "a=b;c=d&e=f", pi => "/1" }, + "*" => { qs => "", pi => "" }, + }.each do |uri,expect| + assert_equal req, @parser.headers(req.clear, str % [ uri ]) + @parser.reset + assert_equal uri, req["REQUEST_URI"], "REQUEST_URI mismatch" + assert_equal expect[qs], req[qs], "#{qs} mismatch" + assert_equal expect[pi], req[pi], "#{pi} mismatch" + next if uri == "*" + uri = URI.parse("http://example.com#{uri}") + assert_equal uri.query.to_s, req[qs], "#{qs} mismatch URI.parse disagrees" + assert_equal uri.path, req[pi], "#{pi} mismatch URI.parse disagrees" + end + end + + def test_path_info_semicolon_absolute + qs = "QUERY_STRING" + pi = "PATH_INFO" + req = {} + str = "GET http://example.com%s HTTP/1.1\r\nHost: www.example.com\r\n\r\n" + { + "/1;a=b?c=d&e=f" => { qs => "c=d&e=f", pi => "/1;a=b" }, + "/1?c=d&e=f" => { qs => "c=d&e=f", pi => "/1" }, + "/1;a=b" => { qs => "", pi => "/1;a=b" }, + "/1;a=b?" => { qs => "", pi => "/1;a=b" }, + "/1?a=b;c=d&e=f" => { qs => "a=b;c=d&e=f", pi => "/1" }, + }.each do |uri,expect| + assert_equal req, @parser.headers(req.clear, str % [ uri ]) + @parser.reset + assert_equal uri, req["REQUEST_URI"], "REQUEST_URI mismatch" + assert_equal "example.com", req["HTTP_HOST"], "Host: mismatch" + assert_equal expect[qs], req[qs], "#{qs} mismatch" + assert_equal expect[pi], req[pi], "#{pi} mismatch" + end + end + end -- Eric Wong ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? 2009-12-07 2:32 ` Eric Wong @ 2009-12-07 9:53 ` Iñaki Baz Castillo 0 siblings, 0 replies; 9+ messages in thread From: Iñaki Baz Castillo @ 2009-12-07 9:53 UTC (permalink / raw) To: rack-devel El Lunes, 7 de Diciembre de 2009, Eric Wong escribió: > This should be a one line fix for Thin, I just pushed out > a similar fix for unicorn.git (with tests). > diff --git a/ext/unicorn_http/unicorn_http_common.rl > b/ext/unicorn_http/unicorn_http_common.rl index 5d46087..041dfec 100644 > --- a/ext/unicorn_http/unicorn_http_common.rl > +++ b/ext/unicorn_http/unicorn_http_common.rl > @@ -33,7 +33,7 @@ > query = ( uchar | reserved )* %query_string ; > param = ( pchar | "/" )* ; > params = ( param ( ";" param )* ) ; > - rel_path = ( path? %request_path (";" params)? ) ("?" %start_query > query)?; > + rel_path = (path? (";" params)? %request_path) ("?" > %start_query query)?; absolute_path = ( "/"+ rel_path ); Thanks, I've reported it to Mongrel and Thin developers. -- Iñaki Baz Castillo <ibc@aliax.net> ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <8108c6090912061939h7f2ae70rac675219ed08f2e8@mail.gmail.com>]
* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? [not found] ` <8108c6090912061939h7f2ae70rac675219ed08f2e8@mail.gmail.com> @ 2009-12-07 11:08 ` masayoshi takahashi 0 siblings, 0 replies; 9+ messages in thread From: masayoshi takahashi @ 2009-12-07 11:08 UTC (permalink / raw) To: rack-devel FYI: PSGI (Perl Web Server Gateway Interface Specification; it's Perl version of Rack or WSGI) has no explicit definition of PATH_INFO's separator, and Plack(PSGI reference implementation) keep ";" in PATH_INFO. cf. http://search.cpan.org/~miyagawa/PSGI-1.03/PSGI.pod http://github.com/miyagawa/Plack http://bulknews.typepad.com/blog/2009/10/request_uri-will-be-in-mojo-support.html Thanks, Masayoshi Takahashi 2009/12/7 masayoshi takahashi <takahashimm@gmail.com>: > Hi, > > 2009/12/1 Iñaki Baz Castillo <ibc@aliax.net>: >> Hi, could I get some reply to this mail please? I strongly think it could be a >> bug in Rack so I can't understand why it is ignored. >> >> In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an >> allowed character into the URI path sections. > > I've read RFC3986 "Uniform Resource Identifier (URI): Generic Syntax". > > * "3.3. Path" in RFC3986: > > Aside from dot-segments in hierarchical paths, a path segment is > considered opaque by the generic syntax. URI producing applications > often use the reserved characters allowed in a segment to delimit > scheme-specific or dereference-handler-specific subcomponents. For > example, the semicolon (";") and equals ("=") reserved characters are > often used to delimit parameters and parameter values applicable to > that segment. The comma (",") reserved character is often used for > similar purposes. For example, one URI producer might use a segment > such as "name;v=1.1" to indicate a reference to version 1.1 of > "name", whereas another might use a segment such as "name,1.1" to > indicate the same. Parameter types may be defined by scheme-specific > semantics, but in most cases the syntax of a parameter is specific to > the implementation of the URI's dereferencing algorithm. > > * "5.4. Reference Resolution Examples" and "5.4.1. Normal Examples" > in RFC3986: > > Within a representation with a well defined base URI of > > http://a/b/c/d;p?q > > a relative reference is transformed to its target URI as follows. > (snip) > "g:h" = "g:h" > "g" = "http://a/b/c/g" > "./g" = "http://a/b/c/g" > "g/" = "http://a/b/c/g/" > "/g" = "http://a/g" > "//g" = "http://g" > "?y" = "http://a/b/c/d;p?y" > "g?y" = "http://a/b/c/g?y" > "#s" = "http://a/b/c/d;p?q#s" > "g#s" = "http://a/b/c/g#s" > "g?y#s" = "http://a/b/c/g?y#s" > ";x" = "http://a/b/c/;x" > "g;x" = "http://a/b/c/g;x" > "g;x?y#s" = "http://a/b/c/g;x?y#s" > (snip) > > I think ";" is used separator like "?" in this document. > > And "2.3. Specific Schemes and their Syntactic Categories" in RFC 1808: > > NOTE: Section 5 of RFC 1738 specifies that the question-mark > character ("?") is allowed in an ftp or file path segment. > However, this is not true in practice and is believed to be an > error in the RFC. Similarly, RFC 1738 allows the reserved > character semicolon (";") within an http path segment, but does > not define its semantics; the correct semantics are as defined > by this document for <params>. > > (notice: RFC1808 is obsoleted by RFC3986) > > I think ";" is confused. So Rack SPEC should not mention about ";". > But, in the real applications like Thin or others, it's not bad > that ";" is treated as ordinary character, not PATH separator. > > Hope this help, > > Masayoshi Takahashi > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-12-07 13:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-11-24 18:22 Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? Iñaki Baz Castillo 2009-11-30 16:27 ` Iñaki Baz Castillo 2009-11-30 23:31 ` Hongli Lai 2009-12-01 0:42 ` Iñaki Baz Castillo 2009-12-05 22:54 ` Scytrin dai Kinthra 2009-12-05 23:15 ` Iñaki Baz Castillo 2009-12-07 2:32 ` Eric Wong 2009-12-07 9:53 ` Iñaki Baz Castillo [not found] ` <8108c6090912061939h7f2ae70rac675219ed08f2e8@mail.gmail.com> 2009-12-07 11:08 ` masayoshi takahashi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).