rack-devel archive mirror (unofficial) https://groups.google.com/group/rack-devel
 help / color / mirror / Atom feed
* Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)?
@ 2009-11-24 18:22 Iñaki Baz Castillo
  2009-11-30 16:27 ` Iñaki Baz Castillo
  0 siblings, 1 reply; 9+ messages in thread
From: Iñaki Baz Castillo @ 2009-11-24 18:22 UTC (permalink / raw)
  To: Rack Development

Hi, I've reported an issue for Thin HTTP server since it cuts PATH_INFO after 
finding a ";":

  http://github.com/macournoyer/thin/issues/closed/#issue/6

---------------------------
If Thin receives a request with request line:

 GET /level1/level2;user=alice/level3/doc.txt HTTP/1.1

then Thin cuts the URI path and leaves:
 env{"PATH_INFO"} => "/level1/level2"

Why? I've looked for such behavior in RFC 3986 and RFC 2616 but I don't find a 
reason to remove all the path content after ";".
---------------------------


A Thin developer closed the bug with the argument:

----------------------------
the env object is constructed according to Rack specs: 
http://rack.rubyforge.org/doc/SPEC.html
If you want the full URI look into env["REQUEST_URI"]
----------------------------


However I don't find in the SPEC where the PATH_INFO must be cut before the 
first semicolon char.

I expect that is not correct at all since if ";" appears in the PATH (I mean 
before the query) then it's a valid char for the PATH.

In fact this issue is breaking my application since Rack receives requests 
containing ";" in the path like:

  GET /app/user%25b@id=%22ibc@domain.org;nat=yes%22%2d

which after hex-unescaping becomes:

  GET /app/user[@id="ibc@domain.org;nat=yes"]

Thin (and Mongrel) creates PATH_INFO as follows:

  /app/user%25b@id=%22ibc@domain.org


A workaround is forcing the client to escape the semicolon, but this violates 
the application specs as ";" is not required to be encoded in the HTTP uri (it 
could be encoded or not, but it's not required).


IMHO it's a bug in Rack SPEC (even if I don't find that point) or in 
Thin/Mongrel implementation.

Thanks a lot for any comment.


-- 
Iñaki Baz Castillo <ibc@aliax.net>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)?
  2009-11-24 18:22 Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? Iñaki Baz Castillo
@ 2009-11-30 16:27 ` Iñaki Baz Castillo
  2009-11-30 23:31   ` Hongli Lai
       [not found]   ` <8108c6090912061939h7f2ae70rac675219ed08f2e8@mail.gmail.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Iñaki Baz Castillo @ 2009-11-30 16:27 UTC (permalink / raw)
  To: Rack Development

Hi, could I get some reply to this mail please? I strongly think it could be a 
bug in Rack so I can't understand why it is ignored.

In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an 
allowed character into the URI path sections.

Regards.


El Martes, 24 de Noviembre de 2009, Iñaki Baz Castillo escribió:
> Hi, I've reported an issue for Thin HTTP server since it cuts PATH_INFO
>  after finding a ";":
> 
>   http://github.com/macournoyer/thin/issues/closed/#issue/6
> 
> ---------------------------
> If Thin receives a request with request line:
> 
>  GET /level1/level2;user=alice/level3/doc.txt HTTP/1.1
> 
> then Thin cuts the URI path and leaves:
>  env{"PATH_INFO"} => "/level1/level2"
> 
> Why? I've looked for such behavior in RFC 3986 and RFC 2616 but I don't
>  find a reason to remove all the path content after ";".
> ---------------------------
> 
> 
> A Thin developer closed the bug with the argument:
> 
> ----------------------------
> the env object is constructed according to Rack specs:
> http://rack.rubyforge.org/doc/SPEC.html
> If you want the full URI look into env["REQUEST_URI"]
> ----------------------------
> 
> 
> However I don't find in the SPEC where the PATH_INFO must be cut before the
> first semicolon char.
> 
> I expect that is not correct at all since if ";" appears in the PATH (I
>  mean before the query) then it's a valid char for the PATH.
> 
> In fact this issue is breaking my application since Rack receives requests
> containing ";" in the path like:
> 
>   GET /app/user%25b@id=%22ibc@domain.org;nat=yes%22%2d
> 
> which after hex-unescaping becomes:
> 
>   GET /app/user[@id="ibc@domain.org;nat=yes"]
> 
> Thin (and Mongrel) creates PATH_INFO as follows:
> 
>   /app/user%25b@id=%22ibc@domain.org
> 
> 
> A workaround is forcing the client to escape the semicolon, but this
>  violates the application specs as ";" is not required to be encoded in the
>  HTTP uri (it could be encoded or not, but it's not required).
> 
> 
> IMHO it's a bug in Rack SPEC (even if I don't find that point) or in
> Thin/Mongrel implementation.
> 
> Thanks a lot for any comment.
> 


-- 
Iñaki Baz Castillo <ibc@aliax.net>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Does Rack SPEC states that PATH_INFO must be cut after ";"  (semicolon)?
  2009-11-30 16:27 ` Iñaki Baz Castillo
@ 2009-11-30 23:31   ` Hongli Lai
  2009-12-01  0:42     ` Iñaki Baz Castillo
       [not found]   ` <8108c6090912061939h7f2ae70rac675219ed08f2e8@mail.gmail.com>
  1 sibling, 1 reply; 9+ messages in thread
From: Hongli Lai @ 2009-11-30 23:31 UTC (permalink / raw)
  To: Rack Development

On Nov 30, 5:27 pm, Iñaki Baz Castillo <i...@aliax.net> wrote:
> Hi, could I get some reply to this mail please? I strongly think it could be a
> bug in Rack so I can't understand why it is ignored.
>
> In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an
> allowed character into the URI path sections.

As far as I know the Rack spec doesn't mention anything about ";".
Neither do the CGI and WSGI specs for that matter:
http://hoohoo.ncsa.illinois.edu/cgi/env.html
http://www.python.org/dev/peps/pep-0333/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Does Rack SPEC states that PATH_INFO must be cut after ";"  (semicolon)?
  2009-11-30 23:31   ` Hongli Lai
@ 2009-12-01  0:42     ` Iñaki Baz Castillo
  2009-12-05 22:54       ` Scytrin dai Kinthra
  0 siblings, 1 reply; 9+ messages in thread
From: Iñaki Baz Castillo @ 2009-12-01  0:42 UTC (permalink / raw)
  To: rack-devel

2009/12/1 Hongli Lai <hongli@phusion.nl>:
> On Nov 30, 5:27 pm, Iñaki Baz Castillo <i...@aliax.net> wrote:
>> Hi, could I get some reply to this mail please? I strongly think it could be a
>> bug in Rack so I can't understand why it is ignored.
>>
>> In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an
>> allowed character into the URI path sections.
>
> As far as I know the Rack spec doesn't mention anything about ";".
> Neither do the CGI and WSGI specs for that matter:
> http://hoohoo.ncsa.illinois.edu/cgi/env.html
> http://www.python.org/dev/peps/pep-0333/

Yes, as I said I inspected the BNF grammar section for both URI and
HTTP specifications, and ";" is an allowed char into an URI segment.
But the fact is that Thin and Mongrel cut the PATH_INFO when finding a ";".

Unfortunatelly the author of Thin closed the bug:
  http://github.com/macournoyer/thin/issues/closed/#issue/6


-- 
Iñaki Baz Castillo
<ibc@aliax.net>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Does Rack SPEC states that PATH_INFO must be cut after ";"  (semicolon)?
  2009-12-01  0:42     ` Iñaki Baz Castillo
@ 2009-12-05 22:54       ` Scytrin dai Kinthra
  2009-12-05 23:15         ` Iñaki Baz Castillo
  0 siblings, 1 reply; 9+ messages in thread
From: Scytrin dai Kinthra @ 2009-12-05 22:54 UTC (permalink / raw)
  To: rack-devel

After going through the various handlers and routing middlewares, I am
unaware of any apps that modify the PATH_INFO in the way described.
The split of PATH_INFO at ';' seems to occur before the request hits
rack.

On Mon, Nov 30, 2009 at 16:42, Iñaki Baz Castillo <ibc@aliax.net> wrote:
> 2009/12/1 Hongli Lai <hongli@phusion.nl>:
>> On Nov 30, 5:27 pm, Iñaki Baz Castillo <i...@aliax.net> wrote:
>>> Hi, could I get some reply to this mail please? I strongly think it could be a
>>> bug in Rack so I can't understand why it is ignored.
>>>
>>> In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an
>>> allowed character into the URI path sections.
>>
>> As far as I know the Rack spec doesn't mention anything about ";".
>> Neither do the CGI and WSGI specs for that matter:
>> http://hoohoo.ncsa.illinois.edu/cgi/env.html
>> http://www.python.org/dev/peps/pep-0333/
>
> Yes, as I said I inspected the BNF grammar section for both URI and
> HTTP specifications, and ";" is an allowed char into an URI segment.
> But the fact is that Thin and Mongrel cut the PATH_INFO when finding a ";".
>
> Unfortunatelly the author of Thin closed the bug:
>  http://github.com/macournoyer/thin/issues/closed/#issue/6
>
>
> --
> Iñaki Baz Castillo
> <ibc@aliax.net>
>



-- 
stadik.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)?
  2009-12-05 22:54       ` Scytrin dai Kinthra
@ 2009-12-05 23:15         ` Iñaki Baz Castillo
  2009-12-07  2:32           ` Eric Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Iñaki Baz Castillo @ 2009-12-05 23:15 UTC (permalink / raw)
  To: rack-devel

El Sábado, 5 de Diciembre de 2009, Scytrin dai Kinthra escribió:
> After going through the various handlers and routing middlewares, I am
> unaware of any apps that modify the PATH_INFO in the way described.
> The split of PATH_INFO at ';' seems to occur before the request hits
> rack.

I'm really sorry as I forgot to update this thread. Thin's developer already 
confirmed me that this is a bug in Mongrel parser:

http://github.com/macournoyer/thin/issues#issue/7

The fact is that using Webrick the issue doesn't happen. It occurs with 
Mongrel itself and Thin as both uses Mongrel HTTP parser.

-- 
Iñaki Baz Castillo <ibc@aliax.net>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)?
  2009-12-05 23:15         ` Iñaki Baz Castillo
@ 2009-12-07  2:32           ` Eric Wong
  2009-12-07  9:53             ` Iñaki Baz Castillo
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Wong @ 2009-12-07  2:32 UTC (permalink / raw)
  To: rack-devel; +Cc: Marc-Andre Cournoyer

Iñaki Baz Castillo <ibc@aliax.net> wrote:
> El Sábado, 5 de Diciembre de 2009, Scytrin dai Kinthra escribió:
> > After going through the various handlers and routing middlewares, I am
> > unaware of any apps that modify the PATH_INFO in the way described.
> > The split of PATH_INFO at ';' seems to occur before the request hits
> > rack.
> 
> I'm really sorry as I forgot to update this thread. Thin's developer already 
> confirmed me that this is a bug in Mongrel parser:
> 
> http://github.com/macournoyer/thin/issues#issue/7
> 
> The fact is that using Webrick the issue doesn't happen. It occurs with 
> Mongrel itself and Thin as both uses Mongrel HTTP parser.

Hi,

This should be a one line fix for Thin, I just pushed out
a similar fix for unicorn.git (with tests).

From e8dd3e13b9a9f548a3138debd09e87fbb69e3998 Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Mon, 7 Dec 2009 02:20:18 +0000
Subject: [PATCH] http: PATH_INFO/REQUEST_PATH includes semi-colons

This is allowed according to RFC 2396, section 3.3 and matches
the behavior of URI.parse, as well.
---
 ext/unicorn_http/unicorn_http_common.rl |    2 +-
 lib/unicorn/app/old_rails/static.rb     |    6 +---
 test/unit/test_http_parser_ng.rb        |   46 +++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/ext/unicorn_http/unicorn_http_common.rl b/ext/unicorn_http/unicorn_http_common.rl
index 5d46087..041dfec 100644
--- a/ext/unicorn_http/unicorn_http_common.rl
+++ b/ext/unicorn_http/unicorn_http_common.rl
@@ -33,7 +33,7 @@
   query = ( uchar | reserved )* %query_string ;
   param = ( pchar | "/" )* ;
   params = ( param ( ";" param )* ) ;
-  rel_path = ( path? %request_path (";" params)? ) ("?" %start_query query)?;
+  rel_path = (path? (";" params)? %request_path) ("?" %start_query query)?;
   absolute_path = ( "/"+ rel_path );
   path_uri = absolute_path > mark %request_uri;
   Absolute_URI = (scheme "://" host_with_port path_uri);
diff --git a/lib/unicorn/app/old_rails/static.rb b/lib/unicorn/app/old_rails/static.rb
index 82f8aa5..13a435e 100644
--- a/lib/unicorn/app/old_rails/static.rb
+++ b/lib/unicorn/app/old_rails/static.rb
@@ -46,11 +46,7 @@ class Unicorn::App::OldRails::Static < Struct.new(:app, :root, :file_server)
     end
 
     # then try the cached version:
-
-    # grab the semi-colon REST operator used by old versions of Rails
-    # this is the reason we didn't just copy the new Rails::Rack::Static
-    env[REQUEST_URI] =~ /^#{Regexp.escape(path_info)}(;[^\?]+)/
-    path_info << "#$1#{ActionController::Base.page_cache_extension}"
+    path_info << ActionController::Base.page_cache_extension
 
     if File.file?("#{root}/#{::Rack::Utils.unescape(path_info)}")
       env[PATH_INFO] = path_info
diff --git a/test/unit/test_http_parser_ng.rb b/test/unit/test_http_parser_ng.rb
index e84c765..bb61e7f 100644
--- a/test/unit/test_http_parser_ng.rb
+++ b/test/unit/test_http_parser_ng.rb
@@ -371,4 +371,50 @@ class HttpParserNgTest < Test::Unit::TestCase
     assert ! parser.headers?
   end
 
+  def test_path_info_semicolon
+    qs = "QUERY_STRING"
+    pi = "PATH_INFO"
+    req = {}
+    str = "GET %s HTTP/1.1\r\nHost: example.com\r\n\r\n"
+    {
+      "/1;a=b?c=d&e=f" => { qs => "c=d&e=f", pi => "/1;a=b" },
+      "/1?c=d&e=f" => { qs => "c=d&e=f", pi => "/1" },
+      "/1;a=b" => { qs => "", pi => "/1;a=b" },
+      "/1;a=b?" => { qs => "", pi => "/1;a=b" },
+      "/1?a=b;c=d&e=f" => { qs => "a=b;c=d&e=f", pi => "/1" },
+      "*" => { qs => "", pi => "" },
+    }.each do |uri,expect|
+      assert_equal req, @parser.headers(req.clear, str % [ uri ])
+      @parser.reset
+      assert_equal uri, req["REQUEST_URI"], "REQUEST_URI mismatch"
+      assert_equal expect[qs], req[qs], "#{qs} mismatch"
+      assert_equal expect[pi], req[pi], "#{pi} mismatch"
+      next if uri == "*"
+      uri = URI.parse("http://example.com#{uri}")
+      assert_equal uri.query.to_s, req[qs], "#{qs} mismatch URI.parse disagrees"
+      assert_equal uri.path, req[pi], "#{pi} mismatch URI.parse disagrees"
+    end
+  end
+
+  def test_path_info_semicolon_absolute
+    qs = "QUERY_STRING"
+    pi = "PATH_INFO"
+    req = {}
+    str = "GET http://example.com%s HTTP/1.1\r\nHost: www.example.com\r\n\r\n"
+    {
+      "/1;a=b?c=d&e=f" => { qs => "c=d&e=f", pi => "/1;a=b" },
+      "/1?c=d&e=f" => { qs => "c=d&e=f", pi => "/1" },
+      "/1;a=b" => { qs => "", pi => "/1;a=b" },
+      "/1;a=b?" => { qs => "", pi => "/1;a=b" },
+      "/1?a=b;c=d&e=f" => { qs => "a=b;c=d&e=f", pi => "/1" },
+    }.each do |uri,expect|
+      assert_equal req, @parser.headers(req.clear, str % [ uri ])
+      @parser.reset
+      assert_equal uri, req["REQUEST_URI"], "REQUEST_URI mismatch"
+      assert_equal "example.com", req["HTTP_HOST"], "Host: mismatch"
+      assert_equal expect[qs], req[qs], "#{qs} mismatch"
+      assert_equal expect[pi], req[pi], "#{pi} mismatch"
+    end
+  end
+
 end
-- 
Eric Wong

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)?
  2009-12-07  2:32           ` Eric Wong
@ 2009-12-07  9:53             ` Iñaki Baz Castillo
  0 siblings, 0 replies; 9+ messages in thread
From: Iñaki Baz Castillo @ 2009-12-07  9:53 UTC (permalink / raw)
  To: rack-devel

El Lunes, 7 de Diciembre de 2009, Eric Wong escribió:

> This should be a one line fix for Thin, I just pushed out
> a similar fix for unicorn.git (with tests).

> diff --git a/ext/unicorn_http/unicorn_http_common.rl
>  b/ext/unicorn_http/unicorn_http_common.rl index 5d46087..041dfec 100644
> --- a/ext/unicorn_http/unicorn_http_common.rl
> +++ b/ext/unicorn_http/unicorn_http_common.rl
> @@ -33,7 +33,7 @@
>    query = ( uchar | reserved )* %query_string ;
>    param = ( pchar | "/" )* ;
>    params = ( param ( ";" param )* ) ;
> -  rel_path = ( path? %request_path (";" params)? ) ("?" %start_query
>  query)?;
>  +  rel_path = (path? (";" params)? %request_path) ("?"
>  %start_query query)?; absolute_path = ( "/"+ rel_path );

Thanks, I've reported it to Mongrel and Thin developers.


-- 
Iñaki Baz Castillo <ibc@aliax.net>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Does Rack SPEC states that PATH_INFO must be cut after ";"  (semicolon)?
       [not found]   ` <8108c6090912061939h7f2ae70rac675219ed08f2e8@mail.gmail.com>
@ 2009-12-07 11:08     ` masayoshi takahashi
  0 siblings, 0 replies; 9+ messages in thread
From: masayoshi takahashi @ 2009-12-07 11:08 UTC (permalink / raw)
  To: rack-devel

FYI:
PSGI (Perl Web Server Gateway Interface Specification;
it's Perl version of Rack or WSGI) has no explicit definition
of PATH_INFO's separator, and Plack(PSGI reference implementation)
keep ";" in PATH_INFO.

cf.
http://search.cpan.org/~miyagawa/PSGI-1.03/PSGI.pod
http://github.com/miyagawa/Plack
http://bulknews.typepad.com/blog/2009/10/request_uri-will-be-in-mojo-support.html

Thanks,

Masayoshi Takahashi


2009/12/7 masayoshi takahashi <takahashimm@gmail.com>:
> Hi,
>
> 2009/12/1 Iñaki Baz Castillo <ibc@aliax.net>:
>> Hi, could I get some reply to this mail please? I strongly think it could be a
>> bug in Rack so I can't understand why it is ignored.
>>
>> In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an
>> allowed character into the URI path sections.
>
> I've read RFC3986 "Uniform Resource Identifier (URI): Generic Syntax".
>
> * "3.3. Path" in RFC3986:
>
>   Aside from dot-segments in hierarchical paths, a path segment is
>   considered opaque by the generic syntax.  URI producing applications
>   often use the reserved characters allowed in a segment to delimit
>   scheme-specific or dereference-handler-specific subcomponents.  For
>   example, the semicolon (";") and equals ("=") reserved characters are
>   often used to delimit parameters and parameter values applicable to
>   that segment.  The comma (",") reserved character is often used for
>   similar purposes.  For example, one URI producer might use a segment
>   such as "name;v=1.1" to indicate a reference to version 1.1 of
>   "name", whereas another might use a segment such as "name,1.1" to
>   indicate the same.  Parameter types may be defined by scheme-specific
>   semantics, but in most cases the syntax of a parameter is specific to
>   the implementation of the URI's dereferencing algorithm.
>
> * "5.4.  Reference Resolution Examples" and "5.4.1.  Normal Examples"
> in RFC3986:
>
>   Within a representation with a well defined base URI of
>
>      http://a/b/c/d;p?q
>
>   a relative reference is transformed to its target URI as follows.
> (snip)
>      "g:h"           =  "g:h"
>      "g"             =  "http://a/b/c/g"
>      "./g"           =  "http://a/b/c/g"
>      "g/"            =  "http://a/b/c/g/"
>      "/g"            =  "http://a/g"
>      "//g"           =  "http://g"
>      "?y"            =  "http://a/b/c/d;p?y"
>      "g?y"           =  "http://a/b/c/g?y"
>      "#s"            =  "http://a/b/c/d;p?q#s"
>      "g#s"           =  "http://a/b/c/g#s"
>      "g?y#s"         =  "http://a/b/c/g?y#s"
>      ";x"            =  "http://a/b/c/;x"
>      "g;x"           =  "http://a/b/c/g;x"
>      "g;x?y#s"       =  "http://a/b/c/g;x?y#s"
> (snip)
>
> I think ";" is used separator like "?" in this document.
>
> And "2.3.  Specific Schemes and their Syntactic Categories" in RFC 1808:
>
>   NOTE: Section 5 of RFC 1738 specifies that the question-mark
>         character ("?") is allowed in an ftp or file path segment.
>         However, this is not true in practice and is believed to be an
>         error in the RFC.  Similarly, RFC 1738 allows the reserved
>         character semicolon (";") within an http path segment, but does
>         not define its semantics; the correct semantics are as defined
>         by this document for <params>.
>
> (notice: RFC1808 is obsoleted by RFC3986)
>
> I think ";"  is confused. So Rack SPEC should not mention about ";".
> But, in the real applications like Thin or others, it's not bad
> that ";" is treated as ordinary character, not PATH separator.
>
> Hope this help,
>
> Masayoshi Takahashi
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-12-07 13:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-24 18:22 Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)? Iñaki Baz Castillo
2009-11-30 16:27 ` Iñaki Baz Castillo
2009-11-30 23:31   ` Hongli Lai
2009-12-01  0:42     ` Iñaki Baz Castillo
2009-12-05 22:54       ` Scytrin dai Kinthra
2009-12-05 23:15         ` Iñaki Baz Castillo
2009-12-07  2:32           ` Eric Wong
2009-12-07  9:53             ` Iñaki Baz Castillo
     [not found]   ` <8108c6090912061939h7f2ae70rac675219ed08f2e8@mail.gmail.com>
2009-12-07 11:08     ` masayoshi takahashi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).