From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 225EE1F453; Sun, 28 Apr 2019 22:32:26 +0000 (UTC) Date: Sun, 28 Apr 2019 22:32:26 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: [RFC] www: set "" everywhere Message-ID: <20190428223226.GA22804@dcvr> References: <20190427212334.uhc2z4ju6tivnrbl@whir> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190427212334.uhc2z4ju6tivnrbl@whir> List-Id: Eric Wong wrote: > t/check-www-inbox.perl now runs tidy(1) for every text/html > response, now. tidy is definitely useful and exposed a bug with unescaped URLs in ExtMsg: https://public-inbox.org/git/20190428221229.22691-1-e@80x24.org/ > I'm no fan of the "Living Standard" quicksand that is HTML 5 > (or wasting 15 bytes on every response). However, being easy > to validate everything with tidy(1) seems alright... However the 15 wasted bytes at the beginning of every single response still bothers me. AFAIK every HTML renderer works fine without it. Also, our Atom feeds use XHTML (instead of HTML), since Atom feed parsers need to understand XML, anyways, and may get the non-overlapping parts of HTML wrong. So, an alternate change could be merely prepending "" before we spawn tidy: > --- a/t/check-www-inbox.perl > +++ b/t/check-www-inbox.perl > @@ -205,5 +182,42 @@ sub worker_loop { > my $c = Dumper($o); > warn "bad: $u $c\n"; > } > + if ($tidy_check) { > + my $raw = $r->decoded_content; my $raw = '' . $r->decoded_content; > + my ($out, $err) = ('', ''); > + my $fail = $tidy_check->(\$raw, \$out, \$err); > + warn "Tidy ($fail) - $u - <1:$out> <2:$err>\n" if $fail; > + } > + } > +}