From mboxrd@z Thu Jan 1 00:00:00 1970 Delivered-To: chneukirchen@gmail.com Received: by 10.229.49.16 with SMTP id t16cs144907qcf; Sun, 12 Sep 2010 09:48:54 -0700 (PDT) Return-Path: Received-SPF: pass (google.com: domain of rack-devel+bncCOuQxaaOFxD0iLTkBBoEbFiYpA@googlegroups.com designates 10.101.179.22 as permitted sender) client-ip=10.101.179.22; Authentication-Results: mr.google.com; spf=pass (google.com: domain of rack-devel+bncCOuQxaaOFxD0iLTkBBoEbFiYpA@googlegroups.com designates 10.101.179.22 as permitted sender) smtp.mail=rack-devel+bncCOuQxaaOFxD0iLTkBBoEbFiYpA@googlegroups.com; dkim=pass header.i=rack-devel+bncCOuQxaaOFxD0iLTkBBoEbFiYpA@googlegroups.com Received: from mr.google.com ([10.101.179.22]) by 10.101.179.22 with SMTP id g22mr633813anp.6.1284310133721 (num_hops = 1); Sun, 12 Sep 2010 09:48:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:mime-version :received:received:date:x-ip:user-agent:x-http-useragent:message-id :subject:from:to:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type:content-transfer-encoding; bh=Fgl+WGChL7eHVA/KFsIaTYJFremxYSsb2SC9bqCuK54=; b=T+2bNUhXtFMG4FWdOM8Ve7/JEhRtmI1lcbGisytaxNij6JPNnGvL6DZD1T3OxwrAvX 2KPL9R8tx/FIBF9xiCgctoJfh8B4bwo6M/KFqlzTLe53e5oIaoNQGTeyMnndnL8CV+DV rAGrZkDpQ4cP5meAQFORpzdu2utvtxcAcMsZI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:mime-version:date:x-ip:user-agent:x-http-useragent :message-id:subject:from:to:x-original-sender:reply-to:precedence :mailing-list:list-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type :content-transfer-encoding; b=PneBUobUJUwyX32yr0zuzAWq8Xt01tfEte5P2nT6ITKREXthDXmj/0fvB0rlLMkt3X YtnYMOZnjkV0equrVec6rMod+5BLtAE8U3I5Q+Hla9fymvR9aBvicB4EMAvoyVwq3UR7 bqpc77AAwcWx577N3qWvuMty86Jkn30GCXjyc= Received: by 10.101.179.22 with SMTP id g22mr92401anp.6.1284310132226; Sun, 12 Sep 2010 09:48:52 -0700 (PDT) X-BeenThere: rack-devel@googlegroups.com Received: by 10.101.131.11 with SMTP id i11ls3728518ann.5.p; Sun, 12 Sep 2010 09:48:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.101.175.33 with SMTP id c33mr88754anp.41.1284310131304; Sun, 12 Sep 2010 09:48:51 -0700 (PDT) Received: by m1g2000vbh.googlegroups.com with HTTP; Sun, 12 Sep 2010 09:48:51 -0700 (PDT) Date: Sun, 12 Sep 2010 09:48:51 -0700 (PDT) X-IP: 77.250.47.10 User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.9) Gecko/20100824 Firefox/3.6.9,gzip(gfe) Message-ID: <86810130-684d-413f-aa69-a56f170459e6@m1g2000vbh.googlegroups.com> Subject: Rack environment encoding From: Hongli Lai To: Rack Development X-Original-Sender: hongli@phusion.nl Reply-To: rack-devel@googlegroups.com Precedence: list Mailing-list: list rack-devel@googlegroups.com; contact rack-devel+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: rack-devel@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The current Rack specification doesn't say anything about the encoding of the value strings in the Rack environment. However from various bug reports it has become clear that Rails and possibly many other apps expect some value strings, such as REQUEST_URI, to be UTF-8. See #16 at http://code.google.com/p/phusion-passenger/issues/detail?id=3D404. I believe the encoding should be standardized. Here are some ideas that might serve as a starting point for discussion. PATH_INFO and QUERY_STRING are usually extracted from REQUEST_URI, however REQUEST_URI is not standardized even though lots of people use it. Furthermore REQUEST_URI tends to be a URI. I therefore propose the following requirements: - REQUEST_URI, if exists, MUST be a valid URI. This implies that REQUEST_URI must contain the unescaped form of the URI (e.g. "/clubs/ %C3%BC", not "/clubs/=FC"). - All required Rack variables that are strings (PATH_INFO, REQUEST_URI, etc) except for HTTP_ variables MUST be encoded as UTF-8. That is, the #encoding method must return #. - HTTP_ variables MUST be encoded as binary. The valid URI requirement for REQUEST_URI guarantees that encoding it in UTF-8 is possible because URIs are valid ASCII. Because PATH_INFO and QUERY_STRING tend to be extracted from REQUEST_URI and are therefore substrings of an URI, it is also possible for them to be UTF-8. The binary requirement for HTTP_ is necessary because HTTP allows header values to contain characters that are not valid UTF-8 nor valid US-ASCII (see the HTTP grammar's TEXT rule). Non-HTTP_ required Rack values must not be ASCII-encoded because Rails and many apps work primarily with UTF-8 strings. If the app does something like some_utf8_string + env['PATH_INFO'] then Ruby 1.9 will complain with an incompatible encoding error. On the other hand, if the app does something like some_utf8_string + env['HTTP_FOO_BAR'] then things will still blow up so I'm not sure whether my requirement makes sense. Does Rails mandate an encoding for its request.env? I'm unsure what to do with all other variables. Should there be requirements about their encodings?