From mboxrd@z Thu Jan 1 00:00:00 1970 Delivered-To: chneukirchen@gmail.com Received: by 10.229.70.138 with SMTP id d10cs9385qcj; Mon, 25 Jul 2011 08:25:20 -0700 (PDT) Return-Path: Received-SPF: pass (google.com: domain of rack-devel+bncCKPsz5fGBhDdlbbxBBoEm2eeGg@googlegroups.com designates 10.91.42.38 as permitted sender) client-ip=10.91.42.38; Authentication-Results: mr.google.com; spf=pass (google.com: domain of rack-devel+bncCKPsz5fGBhDdlbbxBBoEm2eeGg@googlegroups.com designates 10.91.42.38 as permitted sender) smtp.mail=rack-devel+bncCKPsz5fGBhDdlbbxBBoEm2eeGg@googlegroups.com; dkim=pass header.i=rack-devel+bncCKPsz5fGBhDdlbbxBBoEm2eeGg@googlegroups.com Received: from mr.google.com ([10.91.42.38]) by 10.91.42.38 with SMTP id u38mr2130825agj.18.1311607520101 (num_hops = 1); Mon, 25 Jul 2011 08:25:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:mime-version:date:user-agent:x-http-useragent :message-id:subject:from:to:x-original-sender:reply-to:precedence :mailing-list:list-id:x-google-group-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-transfer-encoding; bh=9MLzlf7dPsZOcB+JgGkpYIXmJoDpVC3160GTyyse6Ss=; b=DNh5oEFOvk3k/ui4AJsWLPF8A4VZn3bTL3ciYUtYQajZywl/tq41xw0KnqR5Isu2Bh /9oqvu4n4WcRUpwiJo6of1phLPXpHzzNLVoAG0AOt3OWnnYqXufdOxPyI2C6QI5CqVKH XAGFeDHBikT3nMZOeeyEcrKGelwiVwShcSRJA= Received: by 10.91.42.38 with SMTP id u38mr610892agj.18.1311607517933; Mon, 25 Jul 2011 08:25:17 -0700 (PDT) X-BeenThere: rack-devel@googlegroups.com Received: by 10.91.160.20 with SMTP id m20ls1912332ago.0.gmail; Mon, 25 Jul 2011 08:25:16 -0700 (PDT) Received: by 10.236.9.36 with SMTP id 24mr2125191yhs.71.1311607516843; Mon, 25 Jul 2011 08:25:16 -0700 (PDT) Received: by 10.150.25.1 with SMTP id 1msyby; Mon, 25 Jul 2011 07:22:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.90.42.17 with SMTP id p17mr595754agp.37.1311603754857; Mon, 25 Jul 2011 07:22:34 -0700 (PDT) Received: by p20g2000yqp.googlegroups.com with HTTP; Mon, 25 Jul 2011 07:22:34 -0700 (PDT) Date: Mon, 25 Jul 2011 07:22:34 -0700 (PDT) User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.122 Safari/534.30,gzip(gfe) Message-ID: <871fffb6-96c1-425e-a99e-735ec3aaef80@p20g2000yqp.googlegroups.com> Subject: How to handle unicode escaped params? From: Tobias Bielohlawek To: Rack Development X-Original-Sender: tobi@soundcloud.com Reply-To: rack-devel@googlegroups.com Precedence: list Mailing-list: list rack-devel@googlegroups.com; contact rack-devel+owners@googlegroups.com List-ID: X-Google-Group-Id: 486215384060 List-Post: , List-Help: , List-Archive: Sender: rack-devel@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have a question on correct behavior of handling unicode encoded query params. Let's take character '=E9', in UTF-8 that's encoded '%c3%a9 ' in unicode '%e9'. As an real life example, we're seeing those URLs: 1. not escaped: -> http://soundcloud.com/search?q%5Bfulltext%5D=3Dcaf=E9 2. UTF8 escaped -> http://soundcloud.com/search?q%5Bfulltext%5D=3Dcaf%c3%a9 3. or Unicode escaped -> http://soundcloud.com/search?q%5Bfulltext%5D=3Dcaf= %E9 Using rack 1.3.1, the first two cases are processed correct, but the latter fails with error 'incorrect UTF-8 byte sequence'. Which is correct behavior at first place as it's not UTF-8 but the unicode. I'm no wondering what's best solution here? Why not be smart and give it another run to unescape the URL assuming it's unicode encoded? I checked other players, e.g. Twitter has same error (but fails silently) http://twitter.com/search/=E9 http://twitter.com/search/%c3%a9 http://twitter.com/search/%E9 but not Google, it does it this way: http://www.google.de/search?q=3D=E9 http://www.google.de/search?q=3D%c3%a9 http://www.google.de/search?q=3D%E9 Any advice is very much appreciated. Patch is comming.. Thx - Tobi