git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ian Kelling <ian@iankelling.org>
To: "Jakub Narębski" <jnareb@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
Date: Sat, 24 Sep 2016 15:34:23 -0700	[thread overview]
Message-ID: <1474756463.3745393.735971833.7FD5B5C9@webmail.messagingengine.com> (raw)
In-Reply-To: <2a4c3efb-2145-b699-c980-3079f165a6e1@gmail.com>

On Fri, Sep 23, 2016, at 03:15 PM, Jakub Narębski wrote:
> W dniu 23.09.2016 o 11:08, Ian Kelling napisał:
>
> > The "highlight" binary can, in some cases, determine the language type
> > by the means of file contents, for example the shebang in the first line
> > for some scripting languages.  Make use of this autodetection for files
> > which syntax is not known by gitweb.  In that case, pass the blob
> > contents to "highlight --force"; the parameter is needed to make it
> > always generate HTML output (which includes HTML-escaping).
>
> Right.
>
> >
> > Although we now run highlight on files which do not end up highlighted,
> > performance is virtually unaffected because when we call highlight, we
> > also call sanitize() instead of esc_html(), which is significantly
> > slower.
>
> This paragraph is a bit unclear, for example it is not obvious what
> "..., which is significantly slower" refers to: sanitize() or esc_html().
>
> I think it would be better to write:
>
>   Although we now run highlight on files which do not end up highlighted,
>   performance is virtually unaffected because when we call highlight, it
>   is used for escaping HTML.  In the case that highlight is used, gitweb
>   calls sanitize() instead of esc_html(), and the latter is significantly
>   slower (it does more, being roughly a superset of sanitize()).

Agree. Done in v4.

>
> >        After curling blob view of unhighlighted large and small text
> > files of perl code and license text 100 times each on a local
> > Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
> > request time for all file types.
>
> Also, "curling" is not the word I would like to see. I would say:
>
>   Simple benchmark comparing performance of 'blob' view of files without
>   syntax highlighting in gitweb before and after this change indicates
>   ±1% difference in request time for all file types.  Benchmark was
>   performed on local instance on Debian, using Apache/2.4.23 web server
>   and CGI/PSGI/FCGI/mod_perl.
>
>       ^^^^^^^^^^^^^^^^^^^^^^--- select one
>
> Or something like that; I'm not sure how detailed this should be.
> But it is nice to have such benchmark in the commit message.


Sounds  good. Used it in v4.

>
> Anyway I think that adding yet another configuration toggle for selecting
> whether to use "highlight" syntax autodetection or not would be just an
> unnecessary complication.
>
> Note that the performance loss might be quite higher on MS Windows, with
> its higher cost of fork.  But then they probably do not configure
> server-side highligher anyway.
>
> >
> > Document the feature and improve syntax highlight documentation, add
> > test to ensure gitweb doesn't crash when language detection is used.
>
> Good.
>
> >
> > Signed-off-by: Ian Kelling <ian@iankelling.org>
> > ---
> >  Documentation/gitweb.conf.txt          | 21 ++++++++++++++-------
> >  gitweb/gitweb.perl                     | 10 +++++-----
> >  t/t9500-gitweb-standalone-no-errors.sh |  8 ++++++++
> >  3 files changed, 27 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
>
> We should probably say what does it mean to be "highlight"[1] compatible,
> but it is outside of scope for this patch, and I think also out of scope
> of this series.
>
> >  	Note that 'highlight' feature must be set for gitweb to actually
> >  	use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax <syntax>` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh").  The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is the name of the syntax to be passed via `--syntax <syntax>`
> > +to "highlight".  The last priority is the "highlight" configuration of
> > +`Shebang` regular expressions to detect the language based on the first
> > +line in the file, (for example, matching the line "#!/bin/bash").  See
> > +the highlight documentation and the default config at
> > +/etc/highlight/filetypes.conf for more details.
>
> All right. I guess /etc/highlight/filetypes.conf is the standard
> location?

I think so. I checked packages from homebrew, fedora, suse, debian to
confirm.

>
> >  +
> >  For example if repositories you are hosting use "phtml" extension for
> >  PHP files, and you want to have correct syntax-highlighting for those
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 6cb4280..44094f4 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -3931,15 +3931,16 @@ sub guess_file_syntax {
> >  # or return original FD if no highlighting
> >  sub run_highlighter {
> >  	my ($fd, $highlight, $syntax) = @_;
> > -	return $fd unless ($highlight && defined $syntax);
> > +	return $fd unless ($highlight);
> >
> >  	close $fd;
> > +	my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
> >  	open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
> >  	          quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
> >  	            '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
> >  	            '--', "-fe=$fallback_encoding")." | ".
> >  	          quote_command($highlight_bin).
> > -	          " --replace-tabs=8 --fragment --syntax $syntax |"
> > +	          " --replace-tabs=8 --fragment $syntax_arg |"
> >  		or die_error(500, "Couldn't open file or run syntax highlighter");
> >  	return $fd;
> >  }
>
> All right, nice and understandable.
>
> > @@ -7063,8 +7064,7 @@ sub git_blob {
> >
> >  	my $highlight = gitweb_check_feature('highlight');
> >  	my $syntax = guess_file_syntax($highlight, $file_name);
> > -	$fd = run_highlighter($fd, $highlight, $syntax)
> > -		if $syntax;
> > +	$fd = run_highlighter($fd, $highlight, $syntax);
> >
> >  	git_header_html(undef, $expires);
> >  	my $formats_nav = '';
>
> Good, run unconditionally.
>
> > @@ -7117,7 +7117,7 @@ sub git_blob {
> >  			$line = untabify($line);
> >  			printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
> >  			       $nr, esc_attr(href(-replay => 1)), $nr, $nr,
> > -			       $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
> > +			       $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
> >  		}
>
> Good, use highlighter if possible, not only if syntax is known
> and highlighter is turned on.
>
> Nice and easy to understand after earlier change.
>
> >  	}
> >  	close $fd
> > diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> > index e94b2f1..6d06ed9 100755
> > --- a/t/t9500-gitweb-standalone-no-errors.sh
> > +++ b/t/t9500-gitweb-standalone-no-errors.sh
> > @@ -709,6 +709,14 @@ test_expect_success HIGHLIGHT \
> >  	 git commit -m "Add test.sh" &&
> >  	 gitweb_run "p=.git;a=blob;f=test.sh"'
> >
> > +test_expect_success HIGHLIGHT \
> > +	'syntax highlighting (highlighter language autodetection)' \
> > +	'git config gitweb.highlight yes &&
> > +	 echo "#!/usr/bin/perl" > test &&
> > +	 git add test &&
> > +	 git commit -m "Add test" &&
> > +	 gitweb_run "p=.git;a=blob;f=test"'
>
> Nice.
>
> > +
> >  # ----------------------------------------------------------------------
> >  # forks of projects
> >
> >
>

Thanks for great  suggestions.

  parent reply	other threads:[~2016-09-24 22:34 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-06 19:00 [PATCH] gitweb: use highlight's shebang detection Ian Kelling
2016-09-20 20:22 ` Jakub Narębski
2016-09-21 16:38   ` Junio C Hamano
2016-09-21 17:51     ` Jakub Narębski
2016-09-21 22:15   ` Ian Kelling
2016-09-21 22:18   ` Ian Kelling
2016-09-21 22:24     ` Ian Kelling
2016-09-22 22:50     ` [PATCH v2] " Jakub Narębski
2016-09-23  9:08       ` Ian Kelling
2016-09-23  9:08       ` [PATCH v3 1/2] gitweb: remove unused function parameter Ian Kelling
2016-09-23  9:08         ` [PATCH v3 2/2] gitweb: use highlight's shebang detection Ian Kelling
2016-09-23 22:15           ` Jakub Narębski
2016-09-24 16:21             ` Jakub Narębski
2016-09-24 17:52               ` Junio C Hamano
2016-09-24 22:35               ` Ian Kelling
2016-09-24 22:34             ` Ian Kelling [this message]
2016-09-24 22:32           ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Ian Kelling
2016-09-24 22:32             ` [PATCH v4 2/2] gitweb: use highlight's shebang detection Ian Kelling
2016-09-25 18:04               ` Jakub Narębski
2016-09-28  7:37                 ` Ian Kelling
2016-09-25 17:57             ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Jakub Narębski
2016-09-23 19:44         ` [PATCH v3 1/2] gitweb: remove unused function parameter Jakub Narębski
2016-09-23 19:57           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1474756463.3745393.735971833.7FD5B5C9@webmail.messagingengine.com \
    --to=ian@iankelling.org \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).