rack-devel archive mirror (unofficial) https://groups.google.com/group/rack-devel
 help / color / mirror / Atom feed
* How to handle unicode escaped params?
@ 2011-07-25 14:22 Tobias Bielohlawek
  2011-07-25 18:48 ` John Firebaugh
  0 siblings, 1 reply; 4+ messages in thread
From: Tobias Bielohlawek @ 2011-07-25 14:22 UTC (permalink / raw)
  To: Rack Development

I have a question on correct behavior of handling unicode encoded
query params.

Let's take character 'é', in UTF-8 that's encoded '%c3%a9 ' in unicode
'%e9'. As an real life example, we're seeing those URLs:

1. not escaped: -> http://soundcloud.com/search?q%5Bfulltext%5D=café
2. UTF8 escaped -> http://soundcloud.com/search?q%5Bfulltext%5D=caf%c3%a9
3. or Unicode escaped -> http://soundcloud.com/search?q%5Bfulltext%5D=caf%E9

Using rack 1.3.1, the first two cases are processed correct, but the
latter fails with error 'incorrect UTF-8 byte sequence'. Which is
correct behavior at first place as it's
not UTF-8 but the unicode.

I'm no wondering what's best solution here? Why not be smart and give
it another run to unescape the URL assuming it's unicode encoded?

I checked other players, e.g. Twitter has same error (but fails
silently)
http://twitter.com/search/é
http://twitter.com/search/%c3%a9
http://twitter.com/search/%E9

but not Google, it does it this way:
http://www.google.de/search?q=é
http://www.google.de/search?q=%c3%a9
http://www.google.de/search?q=%E9


Any advice is very much appreciated. Patch is comming..

Thx - Tobi

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-07-26  9:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-25 14:22 How to handle unicode escaped params? Tobias Bielohlawek
2011-07-25 18:48 ` John Firebaugh
2011-07-26  9:07   ` Tobias Bielohlawek | SoundCloud
2011-07-26  9:23     ` Evgeni Dzhelyov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).