From: Ian Kelling <ian@iankelling.org>
To: "Jakub Narębski" <jnareb@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH] gitweb: use highlight's shebang detection
Date: Wed, 21 Sep 2016 15:15:42 -0700 [thread overview]
Message-ID: <1474496142.400086.733142417.560B3AFF@webmail.messagingengine.com> (raw)
In-Reply-To: <108ce713-337a-801a-6c3b-089ef25a3883@gmail.com>
On Tue, Sep 20, 2016, at 01:22 PM, Jakub Narębski wrote:
> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>
> > The highlight binary can detect language by shebang when we can't tell
> > the syntax type by the name of the file.
>
> Was it something always present among highlight[1] binary capabilities,
> or is it something present only in new enough highlight app? Or only
> in some specific fork / specific binary? I couldn't find language
> detection in highlight[1] documentation...
>
> [1]: http://www.andre-simon.de/doku/highlight/en/highlight.php
Search for the word shebang, it's mentioned twice.
>
> If this feature is available only for some version, or for some
> highlighters, gitweb would have to provide an option to configure
> it. It might be an additional configuration variable, it might
> be a special value in the %highlight_basename or %highlight_ext.
Good question. It was added upstream in 2007, and I tested that it's
functioning in the earliest distros I have easy access to: ubuntu 14.04
and debian wheezy.
>
> > To use highlight's shebang
> > detection, add highlight to the pipeline whenever highlight is enabled.
>
> This describes what this patch does, but the sentence feels
> a bit convoluted, as it is stated.
>
Agreed. I've changed it in v2 of the patch, and perhaps this will make
the rest of the patch clearer too. The new paragraph is:
The highlight binary can detect language by shebang when we can't
tell
the syntax type by the name of the file. In that case, pass the blob
to "highlight --force" and the resulting html will have markup for
highlighting if the language was detected.
> >
> > Document the shebang detection and add a test which exercises it in
> > t/t9500-gitweb-standalone-no-errors.sh.
>
> Nice!
>
> >
> > Signed-off-by: Ian Kelling <ian@iankelling.org>
> > ---
> >
> > Notes:
> > I wondered if adding highlight to the pipeline would make viewing a blob
> > with no highlighting take longer but it did not on my computer. I found
> > no noticeable impact on small files and strangely, on a 159k file, it
> > took 7% less time averaged over several requests.
>
> Strange. I would guess that invoking separate binary and perl would
> always
> add to the time (especially on operation systems where forking / running
> command is expensive... though those are not often used with web servers,
> isn't it).
I dug into this a little more, and I think it's because when we call
highlight, we later call sanitize() instead of esc_html(). sanitize() is
faster and makes up for the extra time highlight takes. I ran a test on
my machine calling sanitize and esc_html on each line of gitweb.perl 100
times: 7.4s for sanitize, 12.4s for esc_html.
>
> >
> > Documentation/gitweb.conf.txt | 21 ++++++++++++++-------
> > gitweb/gitweb.perl | 10 +++++-----
> > t/t9500-gitweb-standalone-no-errors.sh | 18 +++++++++++++-----
> > 3 files changed, 32 insertions(+), 17 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
> > Note that 'highlight' feature must be set for gitweb to actually
> > use syntax highlighting.
> > +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile"). The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax <syntax>` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight". The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default. There are three options for adding syntax
> > +detection. The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh"). The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is the name of the syntax to be passed via `--syntax <syntax>`
> > +to "highlight". The last priority is the "highlight" configuration of
> > +`Shebang` regular expressions to detect the language based on the first
> > +line in the file, (for example, matching the line "#!/bin/bash"). See
> > +the highlight documentation and the default config at
> > +/etc/highlight/filetypes.conf for more details.
>
> All right; in addition to expanding the docs, it also improves them.
Noted in v2 commit log.
>
> > +
> > For example if repositories you are hosting use "phtml" extension for
> > PHP files, and you want to have correct syntax-highlighting for those
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 33d701d..a672181 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -3931,15 +3931,16 @@ sub guess_file_syntax {
> > # or return original FD if no highlighting
> > sub run_highlighter {
> > my ($fd, $highlight, $syntax) = @_;
> > - return $fd unless ($highlight && defined $syntax);
> > + return $fd unless ($highlight);
>
> Here we would have check if we want / can invoke "highlight".
I think it's right as is. $highlight says the user wants highlighting,
and now we still want to invoke it when we do not know $syntax.
While I was double checking, I noticed there was an unused parameter to
guess_file_syntax(), $mimetype, which could easily make this not
obvious. I removed it in v2.
>
> >
> > close $fd;
> > + my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
> > open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
> > quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse',
> > '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
> > '--', "-fe=$fallback_encoding")." | ".
> > quote_command($highlight_bin).
> > - " --replace-tabs=8 --fragment --syntax $syntax |"
> > + " --replace-tabs=8 --fragment $syntax_arg |"
> > or die_error(500, "Couldn't open file or run syntax highlighter");
> > return $fd;
> > }
>
> All right (well, except for the question asked at the beginning).
>
> > @@ -7063,8 +7064,7 @@ sub git_blob {
> >
> > my $highlight = gitweb_check_feature('highlight');
> > my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
> > - $fd = run_highlighter($fd, $highlight, $syntax)
> > - if $syntax;
>
> Hmmm... it looks like the old code checked if there was $syntax defined
> twice: once for truthy value in caller, once for definedness in
> run_highlighter().
>
> > + $fd = run_highlighter($fd, $highlight, $syntax);
>
> All right.
>
> >
> > git_header_html(undef, $expires);
> > my $formats_nav = '';
> > @@ -7117,7 +7117,7 @@ sub git_blob {
> > $line = untabify($line);
> > printf qq!<div class="pre"><a id="l%i" href="%s#l%i" class="linenr">%4i</a> %s</div>\n!,
> > $nr, esc_attr(href(-replay => 1)), $nr, $nr,
> > - $syntax ? sanitize($line) : esc_html($line, -nbsp=>1);
> > + $highlight ? sanitize($line) : esc_html($line, -nbsp=>1);
>
> Oh, well. It looks like checking if highlighter could be run in
> run_highlight() is wrong, as the caller (that is, git_blob()) needs
> to know if it is using "highlight" output (which is HTML) or raw blob
> contents (which needs to be HTML-escaped).
Per previous comment, run_highlight() is right, and we use the same
condition here to know if the highlight binary was used. If highlight
was run with --force and did not detect a language in the shebang, it
still outputs html but without adding the highlight markup.
>
> > }
> > }
> > close $fd
> > diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
> > index e94b2f1..9e5fcfe 100755
> > --- a/t/t9500-gitweb-standalone-no-errors.sh
> > +++ b/t/t9500-gitweb-standalone-no-errors.sh
> > @@ -702,12 +702,20 @@ test_expect_success HIGHLIGHT \
> > gitweb_run "p=.git;a=blob;f=file"'
> >
> > test_expect_success HIGHLIGHT \
> > - 'syntax highlighting (highlighted, shell script)' \
> > + 'syntax highlighting (highlighted, shell script shebang)' \
>
> It would be nice to have in test name that it checks if highlighter
> autodetection works, or at least doesn't crash gitweb.
I've updated it to:
syntax highlighting (highlighter language autodetection)
I'm happy to use any suggestion you have.
>
> > 'git config gitweb.highlight yes &&
> > - echo "#!/usr/bin/sh" > test.sh &&
> > - git add test.sh &&
> > - git commit -m "Add test.sh" &&
> > - gitweb_run "p=.git;a=blob;f=test.sh"'
> > + echo "#!/usr/bin/sh" > test &&
> > + git add test &&
> > + git commit -m "Add test" &&
> > + gitweb_run "p=.git;a=blob;f=test"'
> > +
> > +test_expect_success HIGHLIGHT \
> > + 'syntax highlighting (highlighted, header file)' \
>
> Do we check explicit syntax knowledge (based on the extension),
> or autodetect again?
We have explicit syntax knowledge here. My thinking was this would
modify the existing test so that it highlights a different language than
the autodetected one, but the patch is simpler if I just make the
autodetcted one be a different language. I've done that in v2.
>
> > + 'git config gitweb.highlight yes &&
> > + echo "#define ANSWER 42" > test.h &&
> > + git add test.h &&
> > + git commit -m "Add test.h" &&
> > + gitweb_run "p=.git;a=blob;f=test.h"'
> >
> > # ----------------------------------------------------------------------
> > # forks of projects
> >
>
> Thank you for your work on this patch,
> --
> Jakub Narębski
Thank you for reviewing it!
next prev parent reply other threads:[~2016-09-21 22:15 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-06 19:00 [PATCH] gitweb: use highlight's shebang detection Ian Kelling
2016-09-20 20:22 ` Jakub Narębski
2016-09-21 16:38 ` Junio C Hamano
2016-09-21 17:51 ` Jakub Narębski
2016-09-21 22:15 ` Ian Kelling [this message]
2016-09-21 22:18 ` Ian Kelling
2016-09-21 22:24 ` Ian Kelling
2016-09-22 22:50 ` [PATCH v2] " Jakub Narębski
2016-09-23 9:08 ` Ian Kelling
2016-09-23 9:08 ` [PATCH v3 1/2] gitweb: remove unused function parameter Ian Kelling
2016-09-23 9:08 ` [PATCH v3 2/2] gitweb: use highlight's shebang detection Ian Kelling
2016-09-23 22:15 ` Jakub Narębski
2016-09-24 16:21 ` Jakub Narębski
2016-09-24 17:52 ` Junio C Hamano
2016-09-24 22:35 ` Ian Kelling
2016-09-24 22:34 ` Ian Kelling
2016-09-24 22:32 ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Ian Kelling
2016-09-24 22:32 ` [PATCH v4 2/2] gitweb: use highlight's shebang detection Ian Kelling
2016-09-25 18:04 ` Jakub Narębski
2016-09-28 7:37 ` Ian Kelling
2016-09-25 17:57 ` [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter Jakub Narębski
2016-09-23 19:44 ` [PATCH v3 1/2] gitweb: remove unused function parameter Jakub Narębski
2016-09-23 19:57 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1474496142.400086.733142417.560B3AFF@webmail.messagingengine.com \
--to=ian@iankelling.org \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).