rack-devel archive mirror (unofficial) https://groups.google.com/group/rack-devel
 help / color / mirror / Atom feed
* Rack environment encoding
@ 2010-09-12 16:48 Hongli Lai
  2010-09-12 18:00 ` James Tucker
  2010-09-13  4:21 ` Yehuda Katz
  0 siblings, 2 replies; 8+ messages in thread
From: Hongli Lai @ 2010-09-12 16:48 UTC (permalink / raw)
  To: Rack Development

The current Rack specification doesn't say anything about the encoding
of the value strings in the Rack environment. However from various bug
reports it has become clear that Rails and possibly many other apps
expect some value strings, such as REQUEST_URI, to be UTF-8. See #16
at http://code.google.com/p/phusion-passenger/issues/detail?id=404.

I believe the encoding should be standardized. Here are some ideas
that might serve as a starting point for discussion.

PATH_INFO and QUERY_STRING are usually extracted from REQUEST_URI,
however REQUEST_URI is not standardized even though lots of people use
it. Furthermore REQUEST_URI tends to be a URI. I therefore propose the
following requirements:

- REQUEST_URI, if exists, MUST be a valid URI. This implies that
REQUEST_URI must contain the unescaped form of the URI (e.g. "/clubs/
%C3%BC", not "/clubs/ü").
- All required Rack variables that are strings (PATH_INFO,
REQUEST_URI, etc) except for HTTP_ variables MUST be encoded as UTF-8.
That is, the #encoding method must return #<Encoding:UTF-8>.
- HTTP_ variables MUST be encoded as binary.

The valid URI requirement for REQUEST_URI guarantees that encoding it
in UTF-8 is possible because URIs are valid ASCII.
Because PATH_INFO and QUERY_STRING tend to be extracted from
REQUEST_URI and are therefore substrings of an URI, it is also
possible for them to be UTF-8.
The binary requirement for HTTP_ is necessary because HTTP allows
header values to contain characters that are not valid UTF-8 nor valid
US-ASCII (see the HTTP grammar's TEXT rule).

Non-HTTP_ required Rack values must not be ASCII-encoded because Rails
and many apps work primarily with UTF-8 strings. If the app does
something like

  some_utf8_string + env['PATH_INFO']

then Ruby 1.9 will complain with an incompatible encoding error.

On the other hand, if the app does something like

  some_utf8_string + env['HTTP_FOO_BAR']

then things will still blow up so I'm not sure whether my requirement
makes sense. Does Rails mandate an encoding for its request.env?

I'm unsure what to do with all other variables. Should there be
requirements about their encodings?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-09-15  1:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-12 16:48 Rack environment encoding Hongli Lai
2010-09-12 18:00 ` James Tucker
2010-09-12 18:23   ` Steve Klabnik
2010-09-13 13:56   ` Hongli Lai
2010-09-13  4:21 ` Yehuda Katz
2010-09-13  9:05   ` naruse
2010-09-13 14:08   ` Hongli Lai
2010-09-15  1:23     ` naruse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).