From mboxrd@z Thu Jan 1 00:00:00 1970 Delivered-To: chneukirchen@gmail.com Received: by 10.229.49.16 with SMTP id t16cs192230qcf; Mon, 13 Sep 2010 06:56:45 -0700 (PDT) Return-Path: Received-SPF: pass (google.com: domain of rack-devel+bncCOuQxaaOFxCb27jkBBoEk1gUbg@googlegroups.com designates 10.101.24.5 as permitted sender) client-ip=10.101.24.5; Authentication-Results: mr.google.com; spf=pass (google.com: domain of rack-devel+bncCOuQxaaOFxCb27jkBBoEk1gUbg@googlegroups.com designates 10.101.24.5 as permitted sender) smtp.mail=rack-devel+bncCOuQxaaOFxCb27jkBBoEk1gUbg@googlegroups.com; dkim=pass header.i=rack-devel+bncCOuQxaaOFxCb27jkBBoEk1gUbg@googlegroups.com Received: from mr.google.com ([10.101.24.5]) by 10.101.24.5 with SMTP id b5mr802498anj.50.1284386204739 (num_hops = 1); Mon, 13 Sep 2010 06:56:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:mime-version :received:received:date:in-reply-to:x-ip:references:user-agent :x-http-useragent:message-id:subject:from:to:x-original-sender :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-transfer-encoding; bh=L4FGYUooDI+i/p4ngDpZytiMNm97oyN/4dtpANTAL+I=; b=xTbG1/I1pCwL3krUadDpDL8MOYcGhqeaibKxrgGKNlom4/GA20vdcNtw7+Emhy0sXX ic6T9sopIh+09rQlCmfNiQBuYlBG69yvGwgnaLOhiD6vNOf6RJH0K5yO3hd6L50gA7Q/ Ju8KOAB5k8a72yWVzBzQZi5oriHkcfpFV+uDg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:mime-version:date:in-reply-to:x-ip:references :user-agent:x-http-useragent:message-id:subject:from:to :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type:content-transfer-encoding; b=l9xQ7DoAesfVQWiUFfg7rMFy/s+6HAlkKnOtZXdoFFFBzzqGmqKJYEyQI4LcPX3V+F RQOPLpSHupyEgiGSLRE4ywp6jUL6N7oT6vmSmMlEqw2c3Pp2Mv2Xaytct6BOhx4VN6Kx RU6/2UT/W28hXuS33u39l7XHFqybkk8ZFceBg= Received: by 10.101.24.5 with SMTP id b5mr117829anj.50.1284386203303; Mon, 13 Sep 2010 06:56:43 -0700 (PDT) X-BeenThere: rack-devel@googlegroups.com Received: by 10.101.131.11 with SMTP id i11ls3997802ann.5.p; Mon, 13 Sep 2010 06:56:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.101.166.33 with SMTP id t33mr113721ano.56.1284386202646; Mon, 13 Sep 2010 06:56:42 -0700 (PDT) Received: by c13g2000vbr.googlegroups.com with HTTP; Mon, 13 Sep 2010 06:56:42 -0700 (PDT) Date: Mon, 13 Sep 2010 06:56:42 -0700 (PDT) In-Reply-To: <729D0748-DCFF-4643-AB90-3250D40D2370@gmail.com> X-IP: 109.32.98.106 References: <86810130-684d-413f-aa69-a56f170459e6@m1g2000vbh.googlegroups.com> <729D0748-DCFF-4643-AB90-3250D40D2370@gmail.com> User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.9) Gecko/20100824 Firefox/3.6.9,gzip(gfe) Message-ID: <019c87a8-c806-4ad9-8380-1cea72d0cea9@c13g2000vbr.googlegroups.com> Subject: Re: Rack environment encoding From: Hongli Lai To: Rack Development X-Original-Sender: hongli@phusion.nl Reply-To: rack-devel@googlegroups.com Precedence: list Mailing-list: list rack-devel@googlegroups.com; contact rack-devel+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: rack-devel@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Sep 12, 8:00=A0pm, James Tucker wrote: > I'm not sure, but I think they don't expect them to be utf-8, they actual= ly expect them to be compatible with literals. Yes. I'm fine with US-ASCII for PATH_INFO and friends as long as comment #16 in the bug report doesn't result in breakage anymore. > > If the app does > > something like > > =A0some_utf8_string + env['PATH_INFO'] > > then Ruby 1.9 will complain with an incompatible encoding error. > > On your system. No. Specifically, it breaks in Phusion Passenger because we set the encoding of the entire environment hash to binary, regardless of the system encoding, exactly to prevent data loss as you've mentioned earlier. However setting everything to binary results in breakages as described in the bug report which is the reason why I proposed setting some things to UTF-8/ASCII/whatever and other things to binary. > Rails does a lot of work on the /client side/ to try and ensure it receiv= es UTF-8, and tries to enforce UTF-8 elsewhere. Rack can't enforce this as = it doesn't operate client side (build forms). It's also worth noting that r= ails accepts a percentile use case hit here, whereby it makes no attempt to= expect full support for encodings that can't round-trip through unicode. F= or them this is sensible, and maybe it might be for us, but this is why I n= eed particularly CP932 users to actually pay attention here. Until I hear f= rom someone who deals with these issues in the real world, I cannot defer t= o the advice "just use unicode". Alas, one of the larger issues here is tha= t I don't speak the languages required to actually track down most of these= users, so I need help from people who do. I hope there's someone on this l= ist proactive enough to do this, or knows someone to call on. Woah, I think we have a misunderstanding here. I started this thread to discuss what env['something'].encoding should return. Whether env['something'] actually contains UTF-8 data is a different discussion. To re-iterate: the problem that we're running into is that env['something'].encoding always returns # in Phusion Passenger, even if env['something'] contains valid UTF-8 data. Should env['something'] - assuming it contains valid UTF-8 data or ASCII data or whatever - have its #encoding return #? Of course, the easiest way to solve this problem is to mandate all Rack web servers to set the encoding to binary have the frameworks deal with conversions.