rack-devel archive mirror (unofficial) https://groups.google.com/group/rack-devel
 help / color / mirror / Atom feed
From: Hongli Lai <hongli@phusion.nl>
To: Rack Development <rack-devel@googlegroups.com>
Subject: Rack environment encoding
Date: Sun, 12 Sep 2010 09:48:51 -0700 (PDT)	[thread overview]
Message-ID: <86810130-684d-413f-aa69-a56f170459e6@m1g2000vbh.googlegroups.com> (raw)

The current Rack specification doesn't say anything about the encoding
of the value strings in the Rack environment. However from various bug
reports it has become clear that Rails and possibly many other apps
expect some value strings, such as REQUEST_URI, to be UTF-8. See #16
at http://code.google.com/p/phusion-passenger/issues/detail?id=404.

I believe the encoding should be standardized. Here are some ideas
that might serve as a starting point for discussion.

PATH_INFO and QUERY_STRING are usually extracted from REQUEST_URI,
however REQUEST_URI is not standardized even though lots of people use
it. Furthermore REQUEST_URI tends to be a URI. I therefore propose the
following requirements:

- REQUEST_URI, if exists, MUST be a valid URI. This implies that
REQUEST_URI must contain the unescaped form of the URI (e.g. "/clubs/
%C3%BC", not "/clubs/ü").
- All required Rack variables that are strings (PATH_INFO,
REQUEST_URI, etc) except for HTTP_ variables MUST be encoded as UTF-8.
That is, the #encoding method must return #<Encoding:UTF-8>.
- HTTP_ variables MUST be encoded as binary.

The valid URI requirement for REQUEST_URI guarantees that encoding it
in UTF-8 is possible because URIs are valid ASCII.
Because PATH_INFO and QUERY_STRING tend to be extracted from
REQUEST_URI and are therefore substrings of an URI, it is also
possible for them to be UTF-8.
The binary requirement for HTTP_ is necessary because HTTP allows
header values to contain characters that are not valid UTF-8 nor valid
US-ASCII (see the HTTP grammar's TEXT rule).

Non-HTTP_ required Rack values must not be ASCII-encoded because Rails
and many apps work primarily with UTF-8 strings. If the app does
something like

  some_utf8_string + env['PATH_INFO']

then Ruby 1.9 will complain with an incompatible encoding error.

On the other hand, if the app does something like

  some_utf8_string + env['HTTP_FOO_BAR']

then things will still blow up so I'm not sure whether my requirement
makes sense. Does Rails mandate an encoding for its request.env?

I'm unsure what to do with all other variables. Should there be
requirements about their encodings?

             reply	other threads:[~2010-09-12 16:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-12 16:48 Hongli Lai [this message]
2010-09-12 18:00 ` Rack environment encoding James Tucker
2010-09-12 18:23   ` Steve Klabnik
2010-09-13 13:56   ` Hongli Lai
2010-09-13  4:21 ` Yehuda Katz
2010-09-13  9:05   ` naruse
2010-09-13 14:08   ` Hongli Lai
2010-09-15  1:23     ` naruse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://groups.google.com/group/rack-devel

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86810130-684d-413f-aa69-a56f170459e6@m1g2000vbh.googlegroups.com \
    --to=rack-devel@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).