Hi Tobi,
Both %c3%a9 and %e9 could be valid -- it depends on the content of the page and what the server is prepared to accept. In short, a standards-conforming browser will choose which encoding to use based (mainly) on the form's accept-charset attribute. You can read the details here:
Also, note that while UTF-8 is indeed a character encoding, "Unicode" is a standard, not a character encoding itself. As such, it doesn't make sense to talk about é being "encoded in Unicode". It is true that é is the Unicode code point U+00E9, but code points are independent of encodings (they are just an abstract numeric identifiers for a particular characters). You
can say that é is encoded in ISO-8859-1 (and other common encodings) as hex E9, however. (See here for more:
http://www.joelonsoftware.com/articles/Unicode.html)
So, with a form that specifies accept-charset="utf-8", é would be escaped as %c3%a9. With a form that specifies accept-charset="iso-8859-1", it would be escaped as %e9. Leaving it unescaped is technically invalid.
In order to process the data correctly, the server must know what encoding the incoming URL parameters are encoded in, typically either by convention or via HTTP headers. Rack has a deficiency here, in that Rack::Utils.unescape does not allow you to specify the encoding -- only UTF-8 is supported. The underlying API used by Rack,
URI.decode_www_form_component, does allow you to specify the encoding, so if that matters to you, you might have to use that directly. (Though as was discovered recently, it has a rexexp that is vulnerable to catastrophic backtracking.)