From mboxrd@z Thu Jan 1 00:00:00 1970 Delivered-To: chneukirchen@gmail.com Received: by 10.229.49.16 with SMTP id t16cs192876qcf; Mon, 13 Sep 2010 07:08:05 -0700 (PDT) Return-Path: Received-SPF: pass (google.com: domain of rack-devel+bncCOuQxaaOFxDD4LjkBBoEN23Lag@googlegroups.com designates 10.101.151.39 as permitted sender) client-ip=10.101.151.39; Authentication-Results: mr.google.com; spf=pass (google.com: domain of rack-devel+bncCOuQxaaOFxDD4LjkBBoEN23Lag@googlegroups.com designates 10.101.151.39 as permitted sender) smtp.mail=rack-devel+bncCOuQxaaOFxDD4LjkBBoEN23Lag@googlegroups.com; dkim=pass header.i=rack-devel+bncCOuQxaaOFxDD4LjkBBoEN23Lag@googlegroups.com Received: from mr.google.com ([10.101.151.39]) by 10.101.151.39 with SMTP id d39mr811000ano.33.1284386885189 (num_hops = 1); Mon, 13 Sep 2010 07:08:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:mime-version :received:received:date:in-reply-to:x-ip:references:user-agent :x-http-useragent:message-id:subject:from:to:x-original-sender :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-transfer-encoding; bh=r3JmsOkJDk1+9hdDOZLvuSYsspS/rz5p+aj97Rpkd88=; b=1PAxdHmakvWFmqC8UpPr7ofzY91zPh95uhfLPyPf6Jz1SzlvmhOLAksvDRxU17tTbQ MSCW4jTGXpvDRJNL9mAUXyVL36wZOBzADWmW7Bz6eYlOWZsMrXUubhrdYhGY6A6Zq/HZ LC9WUF0zFUCPiPOf+I93kjSmIU9JnQ+62wdYE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:mime-version:date:in-reply-to:x-ip:references :user-agent:x-http-useragent:message-id:subject:from:to :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type:content-transfer-encoding; b=m2MlFLbLhlhjED18wAsUx3Ru6hBX2tXaDWmBzpaVwVv2J694bSjj6rqMBCu/72I9VR DK5JrQrDtHyrjc7oRDBNOH0QeyFbWeJdLEUYdcxfXSUiyFXBIJOHQuTozCKfho2/JyJj lKP0hQZ7MWL4+KaE3Rc+QKL6YrRLlbkE6M1Hg= Received: by 10.101.151.39 with SMTP id d39mr115155ano.33.1284386883750; Mon, 13 Sep 2010 07:08:03 -0700 (PDT) X-BeenThere: rack-devel@googlegroups.com Received: by 10.101.131.11 with SMTP id i11ls4001870ann.5.p; Mon, 13 Sep 2010 07:08:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.101.141.14 with SMTP id t14mr116096ann.53.1284386883277; Mon, 13 Sep 2010 07:08:03 -0700 (PDT) Received: by a19g2000vbi.googlegroups.com with HTTP; Mon, 13 Sep 2010 07:08:03 -0700 (PDT) Date: Mon, 13 Sep 2010 07:08:03 -0700 (PDT) In-Reply-To: X-IP: 109.32.98.106 References: <86810130-684d-413f-aa69-a56f170459e6@m1g2000vbh.googlegroups.com> User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.9) Gecko/20100824 Firefox/3.6.9,gzip(gfe) Message-ID: <7efe2e10-6029-4640-b1bb-1de6a38aeb58@a19g2000vbi.googlegroups.com> Subject: Re: Rack environment encoding From: Hongli Lai To: Rack Development X-Original-Sender: hongli@phusion.nl Reply-To: rack-devel@googlegroups.com Precedence: list Mailing-list: list rack-devel@googlegroups.com; contact rack-devel+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: rack-devel@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Sep 13, 6:21=A0am, Yehuda Katz wrote: > I'm actually opposed to standardizing REQUEST_URI. It's always > possible to extract REQUEST_URI from SCRIPT_NAME and PATH_INFO, and > endpoints that rely on REQUEST_URI cannot be mounted. This was a > serious problem for both Rails and Merb (before we both switched to > using PATH_INFO). I'm not proposing standardizing REQUEST_URI as a variable that must exist or standardizing its meaning. I'm only proposing standardizing its encoding, if it exists. I was using REQUEST_URI as an example rationale to describe why PATH_INFO and QUERY_STRING should only contain ASCII characters (because they're extracting from a URI) and that therefore it is okay to encode PATH_INFO and QUERY_STRING as UTF-8 (or ASCII). > The main Rack variables (REQUEST_METHOD, SCRIPT_NAME, PATH_INFO, > QUERY_STRING, SERVER_NAME, and SERVER_PORT) should always be ASCII. > These should be encoded as ASCII, and then the server should call > encode!. This will have the effect of giving the end application the > encoding that it expects (representing by Encoding.default_internal) > or by leaving it in ASCII, which is the correct encoding. I'm fine with ASCII for most variables, but are you sure SERVER_NAME should be ASCII as well? I don't have any strong opinions on this. > > Because PATH_INFO and QUERY_STRING tend to be extracted from > > REQUEST_URI and are therefore substrings of an URI, it is also > > possible for them to be UTF-8. > > Again, I think the best way to achieve this would be to mark these as > ASCII (which they actually are), and then let the application specify > what transcoding it wants using the standard Ruby mechanism. Fine with this. > > Non-HTTP_ required Rack values must not be ASCII-encoded because Rails > > and many apps work primarily with UTF-8 strings. If the app does > > something like > > > > =A0some_utf8_string + env['PATH_INFO'] > > > > then Ruby 1.9 will complain with an incompatible encoding error. > > > Actually, ASCII and UTF-8 should always concatenate with no error. > Maybe you're thinking about putting BINARY Strings through a Unicode > regular expression? Yeah I wasn't thinking straight, sorry. Fine with ASCII for those. > Concatenating UTF-8 and BINARY should blow up. As you pointed out, we > can't be sure that HTTP_FOO_BAR *is* UTF-8. Yes. After having given it some thought, I'm fine with it blowing up. However the encoding should be standardized so that all web servers consistently blow up, instead of the current situation where things blow up in web server A but not in web server B. So here's a new proposal: - All required Rack variables must be ASCII, except for HTTP_ variables. - HTTP_ variables must be binary. - REQUEST_URI, if exists, must be ASCII. This is the only requirement, I'm not proposing standardizing its meaning or requiring that it exists. - All other variables can have arbitrary encodings (i.e. no standardizations). Outstanding issues: - Should SERVER_NAME be ASCII or UTF-8? I'm fine with either.