From mboxrd@z Thu Jan 1 00:00:00 1970 Delivered-To: chneukirchen@gmail.com Received: by 10.140.141.15 with SMTP id o15cs261260rvd; Fri, 15 Jan 2010 08:46:26 -0800 (PST) Received: from mr.google.com ([10.101.152.34]) by 10.101.152.34 with SMTP id e34mr3357576ano.59.1263573986019 (num_hops = 1); Fri, 15 Jan 2010 08:46:26 -0800 (PST) Received: by 10.101.152.34 with SMTP id e34mr195452ano.59.1263573984308; Fri, 15 Jan 2010 08:46:24 -0800 (PST) X-BeenThere: rack-devel@googlegroups.com Received: by 10.101.169.31 with SMTP id w31ls220262ano.2.p; Fri, 15 Jan 2010 08:46:21 -0800 (PST) Received: by 10.101.11.7 with SMTP id o7mr3197113ani.20.1263573981595; Fri, 15 Jan 2010 08:46:21 -0800 (PST) Received: by 10.101.11.7 with SMTP id o7mr3197100ani.20.1263573981478; Fri, 15 Jan 2010 08:46:21 -0800 (PST) Return-Path: Received: from mail-yx0-f168.google.com (mail-yx0-f168.google.com [209.85.210.168]) by gmr-mx.google.com with ESMTP id 25si188142ywh.12.2010.01.15.08.46.20; Fri, 15 Jan 2010 08:46:20 -0800 (PST) Received-SPF: pass (google.com: domain of james.a.rosen@gmail.com designates 209.85.210.168 as permitted sender) client-ip=209.85.210.168; Received: by mail-yx0-f168.google.com with SMTP id 40so1537001yxe.28 for ; Fri, 15 Jan 2010 08:46:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.101.142.34 with SMTP id u34mr197182ann.39.1263573980785; Fri, 15 Jan 2010 08:46:20 -0800 (PST) Date: Fri, 15 Jan 2010 08:46:20 -0800 (PST) In-Reply-To: <201001151743.12887.ibc@aliax.net> X-IP: 129.83.31.3 References: <201001151743.12887.ibc@aliax.net> User-Agent: G2/1.0 X-HTTP-Via: 1.1 ironportb3.mitre.org:80 (IronPort-WSA/6.3.0-604) X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7,gzip(gfe),gzip(gfe) Message-ID: Subject: Re: Trouble with Unicode in URLs From: Gaius To: Rack Development X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of james.a.rosen@gmail.com designates 209.85.210.168 as permitted sender) smtp.mail=james.a.rosen@gmail.com X-Original-Sender: james.a.rosen@gmail.com Reply-To: rack-devel@googlegroups.com Precedence: list Mailing-list: list rack-devel@googlegroups.com; contact rack-devel+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: X-Thread-Url: http://groups.google.com/group/rack-devel/t/b1083d6710b82f9a X-Message-Url: http://groups.google.com/group/rack-devel/msg/fca049c07cd7d99f Sender: rack-devel@googlegroups.com List-Unsubscribe: , List-Subscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I agree with your analysis of _why_ the server is getting the hex- escaped version. (That's why I used CGI.unescape to fix the problem.) I'm also quite sure that Apache isn't unescaping before passing the request on to Rack. My setup: $ apachectl -v Server version: Apache/2.2.13 (Unix) Server built: Sep 28 2009 16:04:37 $ gem list passenger *** LOCAL GEMS *** passenger (2.2.4) On Jan 15, 11:43=A0am, I=F1aki Baz Castillo wrote: > El Viernes, 15 de Enero de 2010, Gaius escribi=F3: > > > I have a Rails app in which I'd like to use some Unicode URLs: > > > =A0 =A0 # in routes.rb: > > =A0 =A0 map.resources 'proteges', :as =3D> 'prot=E9g=E9s', :only =3D> [= :index] > > > When I go tohttp://localhost:3000/prot=E9g=E9s, I get > > > =A0 =A0 No route matches "/prot%C3%A9g%C3%A9s" with {:method=3D>:get} > > > That was on Mongrel, > > Unicode symbols are not allowed in URL according to its BNF grammar. So t= he > client (the web browser in your case) hex-escapes these symbols. > > This is: the client is sending a request like: > > =A0 GET /prot%C3%A9g%C3%A9s HTTP/1.1 > > which is correct. > > Then the server must hex-unescape it, and this is what you do with your R= ack > middleware :) > > Rack by itself doesn't require that the URL must be hex-unescaped before > passing then to the application, so if a task for your application to do = it. > > > though I also tried Passenger. > > And the same happened? I don't think so as Apache unescapes the URL befor= e > passing the request to the backend (in this case mod_rack). I've checked = it > before: when a request with hex-escaped URL arrives to Apache it unescape= s > before passing the data to mod_rack so you get the Rack variables hex- > unescaped (you should already see the unicode symbols). > > I wonder how is possible your Apache not to unescape the URL before passi= ng it > to Rack, could you please re-check it? which Apache version do you use? > > -- > I=F1aki Baz Castillo