rack-devel archive mirror (unofficial) https://groups.google.com/group/rack-devel
 help / color / mirror / Atom feed
* Trouble with Unicode in URLs
@ 2010-01-15 16:26 Gaius
  2010-01-15 16:43 ` Iñaki Baz Castillo
  0 siblings, 1 reply; 11+ messages in thread
From: Gaius @ 2010-01-15 16:26 UTC (permalink / raw)
  To: Rack Development

I have a Rails app in which I'd like to use some Unicode URLs:

    # in routes.rb:
    map.resources 'proteges', :as => 'protégés', :only => [:index]

When I go to http://localhost:3000/protégés, I get

    No route matches "/prot%C3%A9g%C3%A9s" with {:method=>:get}

That was on Mongrel, though I also tried Passenger.  The fix was to
rewrite the REQUEST_URI environment variable in a Rack middleware:

    require 'cgi'

    class FixUnicodeUrlsMiddleware

      ENVIRONMENT_VARIABLES_TO_FIX = [
        'PATH_INFO', 'REQUEST_PATH', 'REQUEST_URI'
      ]

      def initialize(app)
        @app = app
      end

      def call(env)
        ENVIRONMENT_VARIABLES_TO_FIX.each do |var|
          env[var] = CGI.unescape(env[var]) if env[var] =~ /%[A-Za-
z0-9]/
        end
        @app.call(env)
      end

    end

I'm sure that implementation could cause some problems, though.

See also my question on Stackoverflow:
http://stackoverflow.com/questions/2051553/how-do-i-use-utf-in-a-rails-url

Does anyone have any thoughts on how to add Unicode support to Rack?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 16:26 Trouble with Unicode in URLs Gaius
@ 2010-01-15 16:43 ` Iñaki Baz Castillo
  2010-01-15 16:46   ` Gaius
  0 siblings, 1 reply; 11+ messages in thread
From: Iñaki Baz Castillo @ 2010-01-15 16:43 UTC (permalink / raw)
  To: rack-devel

El Viernes, 15 de Enero de 2010, Gaius escribió:
> I have a Rails app in which I'd like to use some Unicode URLs:
> 
>     # in routes.rb:
>     map.resources 'proteges', :as => 'protégés', :only => [:index]
> 
> When I go to http://localhost:3000/protégés, I get
> 
>     No route matches "/prot%C3%A9g%C3%A9s" with {:method=>:get}
> 
> That was on Mongrel,

Unicode symbols are not allowed in URL according to its BNF grammar. So the 
client (the web browser in your case) hex-escapes these symbols.

This is: the client is sending a request like:

  GET /prot%C3%A9g%C3%A9s HTTP/1.1

which is correct.

Then the server must hex-unescape it, and this is what you do with your Rack 
middleware :)

Rack by itself doesn't require that the URL must be hex-unescaped before 
passing then to the application, so if a task for your application to do it.



> though I also tried Passenger.

And the same happened? I don't think so as Apache unescapes the URL before 
passing the request to the backend (in this case mod_rack). I've checked it 
before: when a request with hex-escaped URL arrives to Apache it unescapes 
before passing the data to mod_rack so you get the Rack variables hex-
unescaped (you should already see the unicode symbols).

I wonder how is possible your Apache not to unescape the URL before passing it 
to Rack, could you please re-check it? which Apache version do you use?


-- 
Iñaki Baz Castillo <ibc@aliax.net>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 16:43 ` Iñaki Baz Castillo
@ 2010-01-15 16:46   ` Gaius
  2010-01-15 17:03     ` Iñaki Baz Castillo
  0 siblings, 1 reply; 11+ messages in thread
From: Gaius @ 2010-01-15 16:46 UTC (permalink / raw)
  To: Rack Development

I agree with your analysis of _why_ the server is getting the hex-
escaped version. (That's why I used CGI.unescape to fix the problem.)
I'm also quite sure that Apache isn't unescaping before passing the
request on to Rack.

My setup:

$ apachectl -v
Server version: Apache/2.2.13 (Unix)
Server built:   Sep 28 2009 16:04:37

$ gem list passenger
*** LOCAL GEMS ***
passenger (2.2.4)


On Jan 15, 11:43 am, Iñaki Baz Castillo <i...@aliax.net> wrote:
> El Viernes, 15 de Enero de 2010, Gaius escribió:
>
> > I have a Rails app in which I'd like to use some Unicode URLs:
>
> >     # in routes.rb:
> >     map.resources 'proteges', :as => 'protégés', :only => [:index]
>
> > When I go tohttp://localhost:3000/protégés, I get
>
> >     No route matches "/prot%C3%A9g%C3%A9s" with {:method=>:get}
>
> > That was on Mongrel,
>
> Unicode symbols are not allowed in URL according to its BNF grammar. So the
> client (the web browser in your case) hex-escapes these symbols.
>
> This is: the client is sending a request like:
>
>   GET /prot%C3%A9g%C3%A9s HTTP/1.1
>
> which is correct.
>
> Then the server must hex-unescape it, and this is what you do with your Rack
> middleware :)
>
> Rack by itself doesn't require that the URL must be hex-unescaped before
> passing then to the application, so if a task for your application to do it.
>
> > though I also tried Passenger.
>
> And the same happened? I don't think so as Apache unescapes the URL before
> passing the request to the backend (in this case mod_rack). I've checked it
> before: when a request with hex-escaped URL arrives to Apache it unescapes
> before passing the data to mod_rack so you get the Rack variables hex-
> unescaped (you should already see the unicode symbols).
>
> I wonder how is possible your Apache not to unescape the URL before passing it
> to Rack, could you please re-check it? which Apache version do you use?
>
> --
> Iñaki Baz Castillo <i...@aliax.net>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 16:46   ` Gaius
@ 2010-01-15 17:03     ` Iñaki Baz Castillo
  2010-01-15 17:13       ` Gaius
  0 siblings, 1 reply; 11+ messages in thread
From: Iñaki Baz Castillo @ 2010-01-15 17:03 UTC (permalink / raw)
  To: rack-devel

El Viernes, 15 de Enero de 2010, Gaius escribió:
> I agree with your analysis of _why_ the server is getting the hex-
> escaped version. (That's why I used CGI.unescape to fix the problem.)

Then what is the problem now? :)


> I'm also quite sure that Apache isn't unescaping before passing the
> request on to Rack.
> 
> My setup:
> 
> $ apachectl -v
> Server version: Apache/2.2.13 (Unix)
> Server built:   Sep 28 2009 16:04:37
> 
> $ gem list passenger
> *** LOCAL GEMS ***
> passenger (2.2.4)

Really interesting. Have you configured something in Apache2?
I have passenger 2.2.4 and apache2:

  $ apache2ctl -v
  Server version: Apache/2.2.11 (Ubuntu)
  Server built:   Nov 13 2009 22:06:57

In my case URL is unescaped by Apache2 ¿?

-- 
Iñaki Baz Castillo <ibc@aliax.net>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 17:03     ` Iñaki Baz Castillo
@ 2010-01-15 17:13       ` Gaius
  2010-01-15 17:16         ` Gaius
  2010-01-15 17:21         ` Gaius
  0 siblings, 2 replies; 11+ messages in thread
From: Gaius @ 2010-01-15 17:13 UTC (permalink / raw)
  To: Rack Development

Well, I don't have a problem. My point is that I had to build a
middleware to solve the problem. To me, that indicates that some part
of Rack (either core or contrib) might want this unencoding so others
don't have the same problem.  Of course, if it's really an httpd
problem, I'd rather solve it there.

I haven't done anything to httpd.conf other than add Passenger, as
evidenced by the following diff:

$ diff /etc/apache2/httpd.conf{,.original}
486d485
< SSLSessionCache dbm:/var/log/apache2/ssl_gcache_data
490,507d488
<
< LoadModule passenger_module /Library/Ruby/Gems/1.8/gems/
passenger-2.2.4/ext/apache2/mod_passenger.so
< <IfModule passenger_module>
< PassengerRoot /Library/Ruby/Gems/1.8/gems/passenger-2.2.4
< PassengerRuby /System/Library/Frameworks/Ruby.framework/Versions/1.8/
usr/bin/ruby
< </IfModule>
<
<
< # Added by the Passenger preference pane
< # Make sure to include the Passenger configuration (the LoadModule,
< # PassengerRoot, and PassengerRuby directives) before this section.
< <IfModule passenger_module>
<   NameVirtualHost *:80
<   <VirtualHost *:80>
<     ServerName _default_
<   </VirtualHost>
<   Include /private/etc/apache2/passenger_pane_vhosts/*.conf
< </IfModule>
\ No newline at end of file


On Jan 15, 12:03 pm, Iñaki Baz Castillo <i...@aliax.net> wrote:
> El Viernes, 15 de Enero de 2010, Gaius escribió:
>
> > I agree with your analysis of _why_ the server is getting the hex-
> > escaped version. (That's why I used CGI.unescape to fix the problem.)
>
> Then what is the problem now? :)
>
> > I'm also quite sure that Apache isn't unescaping before passing the
> > request on to Rack.
>
> > My setup:
>
> > $ apachectl -v
> > Server version: Apache/2.2.13 (Unix)
> > Server built:   Sep 28 2009 16:04:37
>
> > $ gem list passenger
> > *** LOCAL GEMS ***
> > passenger (2.2.4)
>
> Really interesting. Have you configured something in Apache2?
> I have passenger 2.2.4 and apache2:
>
>   $ apache2ctl -v
>   Server version: Apache/2.2.11 (Ubuntu)
>   Server built:   Nov 13 2009 22:06:57
>
> In my case URL is unescaped by Apache2 ¿?
>
> --
> Iñaki Baz Castillo <i...@aliax.net>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 17:13       ` Gaius
@ 2010-01-15 17:16         ` Gaius
  2010-01-15 17:48           ` Iñaki Baz Castillo
  2010-01-15 17:21         ` Gaius
  1 sibling, 1 reply; 11+ messages in thread
From: Gaius @ 2010-01-15 17:16 UTC (permalink / raw)
  To: Rack Development

I just found this older thread by you. If only we could just switch
computers!

http://groups.google.com/group/rack-devel/browse_thread/thread/d16abdccdb9026e8

On Jan 15, 12:13 pm, Gaius <james.a.ro...@gmail.com> wrote:
> Well, I don't have a problem. My point is that I had to build a
> middleware to solve the problem. To me, that indicates that some part
> of Rack (either core or contrib) might want this unencoding so others
> don't have the same problem.  Of course, if it's really an httpd
> problem, I'd rather solve it there.
>
> I haven't done anything to httpd.conf other than add Passenger, as
> evidenced by the following diff:
>
> $ diff /etc/apache2/httpd.conf{,.original}
> 486d485
> < SSLSessionCache dbm:/var/log/apache2/ssl_gcache_data
> 490,507d488
> <
> < LoadModule passenger_module /Library/Ruby/Gems/1.8/gems/
> passenger-2.2.4/ext/apache2/mod_passenger.so
> < <IfModule passenger_module>
> < PassengerRoot /Library/Ruby/Gems/1.8/gems/passenger-2.2.4
> < PassengerRuby /System/Library/Frameworks/Ruby.framework/Versions/1.8/
> usr/bin/ruby
> < </IfModule>
> <
> <
> < # Added by the Passenger preference pane
> < # Make sure to include the Passenger configuration (the LoadModule,
> < # PassengerRoot, and PassengerRuby directives) before this section.
> < <IfModule passenger_module>
> <   NameVirtualHost *:80
> <   <VirtualHost *:80>
> <     ServerName _default_
> <   </VirtualHost>
> <   Include /private/etc/apache2/passenger_pane_vhosts/*.conf
> < </IfModule>
> \ No newline at end of file
>
> On Jan 15, 12:03 pm, Iñaki Baz Castillo <i...@aliax.net> wrote:
>
> > El Viernes, 15 de Enero de 2010, Gaius escribió:
>
> > > I agree with your analysis of _why_ the server is getting the hex-
> > > escaped version. (That's why I used CGI.unescape to fix the problem.)
>
> > Then what is the problem now? :)
>
> > > I'm also quite sure that Apache isn't unescaping before passing the
> > > request on to Rack.
>
> > > My setup:
>
> > > $ apachectl -v
> > > Server version: Apache/2.2.13 (Unix)
> > > Server built:   Sep 28 2009 16:04:37
>
> > > $ gem list passenger
> > > *** LOCAL GEMS ***
> > > passenger (2.2.4)
>
> > Really interesting. Have you configured something in Apache2?
> > I have passenger 2.2.4 and apache2:
>
> >   $ apache2ctl -v
> >   Server version: Apache/2.2.11 (Ubuntu)
> >   Server built:   Nov 13 2009 22:06:57
>
> > In my case URL is unescaped by Apache2 ¿?
>
> > --
> > Iñaki Baz Castillo <i...@aliax.net>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 17:13       ` Gaius
  2010-01-15 17:16         ` Gaius
@ 2010-01-15 17:21         ` Gaius
  2010-01-15 17:49           ` Iñaki Baz Castillo
  1 sibling, 1 reply; 11+ messages in thread
From: Gaius @ 2010-01-15 17:21 UTC (permalink / raw)
  To: Rack Development

One other interesting tidbit: it works great on my production Ubuntu
server:

PROD> apache2 -v
Server version: Apache/2.2.11 (Ubuntu)
Server built:   Aug 18 2009 14:28:29

It's only on my Mac that it fails. Hmm.

On Jan 15, 12:13 pm, Gaius <james.a.ro...@gmail.com> wrote:
> Well, I don't have a problem. My point is that I had to build a
> middleware to solve the problem. To me, that indicates that some part
> of Rack (either core or contrib) might want this unencoding so others
> don't have the same problem.  Of course, if it's really an httpd
> problem, I'd rather solve it there.
>
> I haven't done anything to httpd.conf other than add Passenger, as
> evidenced by the following diff:
>
> $ diff /etc/apache2/httpd.conf{,.original}
> 486d485
> < SSLSessionCache dbm:/var/log/apache2/ssl_gcache_data
> 490,507d488
> <
> < LoadModule passenger_module /Library/Ruby/Gems/1.8/gems/
> passenger-2.2.4/ext/apache2/mod_passenger.so
> < <IfModule passenger_module>
> < PassengerRoot /Library/Ruby/Gems/1.8/gems/passenger-2.2.4
> < PassengerRuby /System/Library/Frameworks/Ruby.framework/Versions/1.8/
> usr/bin/ruby
> < </IfModule>
> <
> <
> < # Added by the Passenger preference pane
> < # Make sure to include the Passenger configuration (the LoadModule,
> < # PassengerRoot, and PassengerRuby directives) before this section.
> < <IfModule passenger_module>
> <   NameVirtualHost *:80
> <   <VirtualHost *:80>
> <     ServerName _default_
> <   </VirtualHost>
> <   Include /private/etc/apache2/passenger_pane_vhosts/*.conf
> < </IfModule>
> \ No newline at end of file
>
> On Jan 15, 12:03 pm, Iñaki Baz Castillo <i...@aliax.net> wrote:
>
> > El Viernes, 15 de Enero de 2010, Gaius escribió:
>
> > > I agree with your analysis of _why_ the server is getting the hex-
> > > escaped version. (That's why I used CGI.unescape to fix the problem.)
>
> > Then what is the problem now? :)
>
> > > I'm also quite sure that Apache isn't unescaping before passing the
> > > request on to Rack.
>
> > > My setup:
>
> > > $ apachectl -v
> > > Server version: Apache/2.2.13 (Unix)
> > > Server built:   Sep 28 2009 16:04:37
>
> > > $ gem list passenger
> > > *** LOCAL GEMS ***
> > > passenger (2.2.4)
>
> > Really interesting. Have you configured something in Apache2?
> > I have passenger 2.2.4 and apache2:
>
> >   $ apache2ctl -v
> >   Server version: Apache/2.2.11 (Ubuntu)
> >   Server built:   Nov 13 2009 22:06:57
>
> > In my case URL is unescaped by Apache2 ¿?
>
> > --
> > Iñaki Baz Castillo <i...@aliax.net>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 17:16         ` Gaius
@ 2010-01-15 17:48           ` Iñaki Baz Castillo
  0 siblings, 0 replies; 11+ messages in thread
From: Iñaki Baz Castillo @ 2010-01-15 17:48 UTC (permalink / raw)
  To: rack-devel

El Viernes, 15 de Enero de 2010, Gaius escribió:
> I just found this older thread by you. If only we could just switch
> computers!
> 
> http://groups.google.com/group/rack-devel/browse_thread/thread/d16abdccdb90
> 26e8

Yes. I also asked in Apache IRC channel and nobody told me how to achieve it 
(avoid Apache hex-unescaping the request URI), in fact based on the comments 
received in that IRC session I would think that it's not possible to dissable 
it (until you said that in your case it doesn't do it).

-- 
Iñaki Baz Castillo <ibc@aliax.net>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 17:21         ` Gaius
@ 2010-01-15 17:49           ` Iñaki Baz Castillo
  2010-01-15 19:23             ` Gaius
  0 siblings, 1 reply; 11+ messages in thread
From: Iñaki Baz Castillo @ 2010-01-15 17:49 UTC (permalink / raw)
  To: rack-devel

El Viernes, 15 de Enero de 2010, Gaius escribió:
> One other interesting tidbit: it works great on my production Ubuntu
> server:
> 
> PROD> apache2 -v
> Server version: Apache/2.2.11 (Ubuntu)
> Server built:   Aug 18 2009 14:28:29
> 
> It's only on my Mac that it fails. Hmm.

Whith "fails" do you mean that the URI is hex-escaped by apache?


-- 
Iñaki Baz Castillo <ibc@aliax.net>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 17:49           ` Iñaki Baz Castillo
@ 2010-01-15 19:23             ` Gaius
  2010-01-15 21:08               ` Iñaki Baz Castillo
  0 siblings, 1 reply; 11+ messages in thread
From: Gaius @ 2010-01-15 19:23 UTC (permalink / raw)
  To: Rack Development

Correct. On Prod (Apache 2.2.11, Ubuntu), I get proper unescaping by
the time the request hits Rack. On Dev (Apache 2.2.13, OSX), I get hex-
escaped URLs in Rack.

On Jan 15, 12:49 pm, Iñaki Baz Castillo <i...@aliax.net> wrote:
> El Viernes, 15 de Enero de 2010, Gaius escribió:
>
> > One other interesting tidbit: it works great on my production Ubuntu
> > server:
>
> > PROD> apache2 -v
> > Server version: Apache/2.2.11 (Ubuntu)
> > Server built:   Aug 18 2009 14:28:29
>
> > It's only on my Mac that it fails. Hmm.
>
> Whith "fails" do you mean that the URI is hex-escaped by apache?
>
> --
> Iñaki Baz Castillo <i...@aliax.net>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trouble with Unicode in URLs
  2010-01-15 19:23             ` Gaius
@ 2010-01-15 21:08               ` Iñaki Baz Castillo
  0 siblings, 0 replies; 11+ messages in thread
From: Iñaki Baz Castillo @ 2010-01-15 21:08 UTC (permalink / raw)
  To: rack-devel

El Viernes, 15 de Enero de 2010, Gaius escribió:
> Correct. On Prod (Apache 2.2.11, Ubuntu), I get proper unescaping by
> the time the request hits Rack. On Dev (Apache 2.2.13, OSX), I get hex-
> escaped URLs in Rack.

Annoying... ¿?

-- 
Iñaki Baz Castillo <ibc@aliax.net>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-01-15 21:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-15 16:26 Trouble with Unicode in URLs Gaius
2010-01-15 16:43 ` Iñaki Baz Castillo
2010-01-15 16:46   ` Gaius
2010-01-15 17:03     ` Iñaki Baz Castillo
2010-01-15 17:13       ` Gaius
2010-01-15 17:16         ` Gaius
2010-01-15 17:48           ` Iñaki Baz Castillo
2010-01-15 17:21         ` Gaius
2010-01-15 17:49           ` Iñaki Baz Castillo
2010-01-15 19:23             ` Gaius
2010-01-15 21:08               ` Iñaki Baz Castillo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).