git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/3] gitweb: Be smarter about linking to SHA1s in log messages
@ 2016-10-06  9:11 Ævar Arnfjörð Bjarmason
  2016-10-06  9:11 ` [PATCH v2 1/3] gitweb: Fix a typo in a comment Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-10-06  9:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski,
	Ævar Arnfjörð Bjarmason

This is v2 of patches I sent on September 21st starting at
<20160921114428.28664-1-avarab@gmail.com>. Jakub Narębski had a lot of
feedback for that series (thanks!). Which as far as I can tell I've
incorporated entirely in this re-roll.

Ævar Arnfjörð Bjarmason (3):
  gitweb: Fix a typo in a comment
  gitweb: Link to 7-char+ SHA1s, not only 8-char+
  gitweb: Link to "git describe"'d commits in log messages

 gitweb/gitweb.perl | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

-- 
2.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/3] gitweb: Fix a typo in a comment
  2016-10-06  9:11 [PATCH v2 0/3] gitweb: Be smarter about linking to SHA1s in log messages Ævar Arnfjörð Bjarmason
@ 2016-10-06  9:11 ` Ævar Arnfjörð Bjarmason
  2016-10-14 17:34   ` Jakub Narębski
  2016-10-06  9:11 ` [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+ Ævar Arnfjörð Bjarmason
  2016-10-06  9:11 ` [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-10-06  9:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski,
	Ævar Arnfjörð Bjarmason

Change a typo'd MIME type in a comment. The Content-Type is
application/xhtml+xml, not application/xhtm+xml.

Fixes up code originally added in 53c4031 ("gitweb: Strip
non-printable characters from syntax highlighter output", 2011-09-16).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 44094f4..cba7405 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1616,7 +1616,7 @@ sub esc_path {
 	return $str;
 }
 
-# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
+# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)
 sub sanitize {
 	my $str = shift;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
  2016-10-06  9:11 [PATCH v2 0/3] gitweb: Be smarter about linking to SHA1s in log messages Ævar Arnfjörð Bjarmason
  2016-10-06  9:11 ` [PATCH v2 1/3] gitweb: Fix a typo in a comment Ævar Arnfjörð Bjarmason
@ 2016-10-06  9:11 ` Ævar Arnfjörð Bjarmason
  2016-10-06 19:44   ` Junio C Hamano
  2016-10-14 17:45   ` Jakub Narębski
  2016-10-06  9:11 ` [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages Ævar Arnfjörð Bjarmason
  2 siblings, 2 replies; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-10-06  9:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski,
	Ævar Arnfjörð Bjarmason

Change the minimum length of an abbreviated object identifier in the
commit message gitweb tries to turn into link from 8 hexchars to 7.

This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb:
SHA-1 in commit log message links to "object" view", 2006-12-10), but
the default abbreviation length is 7, and has been for a long time.

It's still possible to reference SHA1s down to 4 characters in length,
see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
git actually produce that, so I doubt anyone is putting that into log
messages in practice, but people definitely do put 7 character SHA1s
into log messages.

I think it's fairly dubious to link to things matching [0-9a-fA-F]
here as opposed to just [0-9a-f], that dates back to the initial
version of gitweb from 161332a ("first working version",
2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
them as far as I can tell.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cba7405..92b5e91 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2036,7 +2036,7 @@ sub format_log_line_html {
 	my $line = shift;
 
 	$line = esc_html($line, -nbsp=>1);
-	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
+	$line =~ s{\b([0-9a-fA-F]{7,40})\b}{
 		$cgi->a({-href => href(action=>"object", hash=>$1),
 					-class => "text"}, $1);
 	}eg;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
  2016-10-06  9:11 [PATCH v2 0/3] gitweb: Be smarter about linking to SHA1s in log messages Ævar Arnfjörð Bjarmason
  2016-10-06  9:11 ` [PATCH v2 1/3] gitweb: Fix a typo in a comment Ævar Arnfjörð Bjarmason
  2016-10-06  9:11 ` [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+ Ævar Arnfjörð Bjarmason
@ 2016-10-06  9:11 ` Ævar Arnfjörð Bjarmason
  2016-10-06 19:51   ` Junio C Hamano
  2016-10-14 20:06   ` Jakub Narębski
  2 siblings, 2 replies; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-10-06  9:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jakub Narebski,
	Ævar Arnfjörð Bjarmason

Change the log formatting function to know about "git describe" output
such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".

There are still many valid refnames that we don't link to
e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
similarly it's trivially possible to create some refnames like
"æ/var-gf6727b0" or which won't be picked up by this regex.

There's surely room for improvement here, but I just wanted to address
the very common case of sticking "git describe" output into commit
messages without trying to link to all possible refnames, that's going
to be a rather futile exercise given that this is free text, and it
would be prohibitively expensive to look up whether the references in
question exist in our repository.

There was on-list discussion about how we could do better than this
patch. Junio suggested to update parse_commits() to call a new
"gitweb--helper" command which would pass each of the revision
candidates through "rev-parse --verify --quiet". That would cut down
on our false positives (e.g. we'll link to "deadbeef"), and also allow
us to be more aggressive in selecting candidate revisions.

That may be too expensive to work in practice, or it may
not. Investigating that would be a good follow-up to this patch.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 gitweb/gitweb.perl | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 92b5e91..7cf68f0 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2036,10 +2036,24 @@ sub format_log_line_html {
 	my $line = shift;
 
 	$line = esc_html($line, -nbsp=>1);
-	$line =~ s{\b([0-9a-fA-F]{7,40})\b}{
+	$line =~ s{
+        \b
+        (
+            # The output of "git describe", e.g. v2.10.0-297-gf6727b0
+            # or hadoop-20160921-113441-20-g094fb7d
+            (?<!-) # see strbuf_check_tag_ref(). Tags can't start with -
+            [A-Za-z0-9.-]+
+            (?!\.) # refs can't end with ".", see check_refname_format()
+            -g[0-9a-fA-F]{7,40}
+            |
+            # Just a normal looking Git SHA1
+            [0-9a-fA-F]{7,40}
+        )
+        \b
+    }{
 		$cgi->a({-href => href(action=>"object", hash=>$1),
 					-class => "text"}, $1);
-	}eg;
+	}egx;
 
 	return $line;
 }
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
  2016-10-06  9:11 ` [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+ Ævar Arnfjörð Bjarmason
@ 2016-10-06 19:44   ` Junio C Hamano
  2016-10-14 17:45   ` Jakub Narębski
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2016-10-06 19:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jakub Narebski

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the minimum length of an abbreviated object identifier in the
> commit message gitweb tries to turn into link from 8 hexchars to 7.
>
> This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb:
> SHA-1 in commit log message links to "object" view", 2006-12-10), but
> the default abbreviation length is 7, and has been for a long time.
>
> It's still possible to reference SHA1s down to 4 characters in length,
> see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
> git actually produce that, so I doubt anyone is putting that into log
> messages in practice, but people definitely do put 7 character SHA1s
> into log messages.
>
> I think it's fairly dubious to link to things matching [0-9a-fA-F]
> here as opposed to just [0-9a-f], that dates back to the initial
> version of gitweb from 161332a ("first working version",
> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
> them as far as I can tell.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---

s/SHA1/SHA-1/g.  I agree that A-F are dubious.

>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index cba7405..92b5e91 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>  	my $line = shift;
>  
>  	$line = esc_html($line, -nbsp=>1);
> -	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> +	$line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>  		$cgi->a({-href => href(action=>"object", hash=>$1),
>  					-class => "text"}, $1);
>  	}eg;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
  2016-10-06  9:11 ` [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages Ævar Arnfjörð Bjarmason
@ 2016-10-06 19:51   ` Junio C Hamano
  2016-10-09 11:20     ` Ævar Arnfjörð Bjarmason
  2016-10-14 20:06   ` Jakub Narębski
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-10-06 19:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jakub Narebski

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the log formatting function to know about "git describe" output
> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
>
> There are still many valid refnames that we don't link to
> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
> similarly it's trivially possible to create some refnames like
> "æ/var-gf6727b0" or which won't be picked up by this regex.

Not a serious counter-proposal or suggestion, and certainly not an
objection to the patch I am responding to, but I wonder if it is so
bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08.

IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we
allowed an optional 'g' in front of the hex, e.g.

-	$line =~ s{\b([0-9a-fA-F]{7,40})\b}{
+	$line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{

wouldn't that be much simpler, covers more cases and sufficient?

> There's surely room for improvement here, but I just wanted to address
> the very common case of sticking "git describe" output into commit
> messages without trying to link to all possible refnames, that's going
> to be a rather futile exercise given that this is free text, and it
> would be prohibitively expensive to look up whether the references in
> question exist in our repository.
>
> There was on-list discussion about how we could do better than this
> patch. Junio suggested to update parse_commits() to call a new
> "gitweb--helper" command which would pass each of the revision
> candidates through "rev-parse --verify --quiet". That would cut down
> on our false positives (e.g. we'll link to "deadbeef"), and also allow
> us to be more aggressive in selecting candidate revisions.
>
> That may be too expensive to work in practice, or it may
> not. Investigating that would be a good follow-up to this patch.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  gitweb/gitweb.perl | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 92b5e91..7cf68f0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>  	my $line = shift;
>  
>  	$line = esc_html($line, -nbsp=>1);
> -	$line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> +	$line =~ s{
> +        \b
> +        (
> +            # The output of "git describe", e.g. v2.10.0-297-gf6727b0
> +            # or hadoop-20160921-113441-20-g094fb7d
> +            (?<!-) # see strbuf_check_tag_ref(). Tags can't start with -
> +            [A-Za-z0-9.-]+
> +            (?!\.) # refs can't end with ".", see check_refname_format()
> +            -g[0-9a-fA-F]{7,40}
> +            |
> +            # Just a normal looking Git SHA1
> +            [0-9a-fA-F]{7,40}
> +        )
> +        \b
> +    }{
>  		$cgi->a({-href => href(action=>"object", hash=>$1),
>  					-class => "text"}, $1);
> -	}eg;
> +	}egx;
>  
>  	return $line;
>  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
  2016-10-06 19:51   ` Junio C Hamano
@ 2016-10-09 11:20     ` Ævar Arnfjörð Bjarmason
  2016-10-14 17:50       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-10-09 11:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git, Jakub Narebski

On Thu, Oct 6, 2016 at 9:51 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Change the log formatting function to know about "git describe" output
>> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
>>
>> There are still many valid refnames that we don't link to
>> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
>> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
>> similarly it's trivially possible to create some refnames like
>> "æ/var-gf6727b0" or which won't be picked up by this regex.
>
> Not a serious counter-proposal or suggestion, and certainly not an
> objection to the patch I am responding to, but I wonder if it is so
> bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08.
>
> IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we
> allowed an optional 'g' in front of the hex, e.g.
>
> -       $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> +       $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{
>
> wouldn't that be much simpler, covers more cases and sufficient?

It would be more simpler, maybe that's the right approach. I have a
preference for making the entire reference a link. I think it makes
more sense for the UI.

>> There's surely room for improvement here, but I just wanted to address
>> the very common case of sticking "git describe" output into commit
>> messages without trying to link to all possible refnames, that's going
>> to be a rather futile exercise given that this is free text, and it
>> would be prohibitively expensive to look up whether the references in
>> question exist in our repository.
>>
>> There was on-list discussion about how we could do better than this
>> patch. Junio suggested to update parse_commits() to call a new
>> "gitweb--helper" command which would pass each of the revision
>> candidates through "rev-parse --verify --quiet". That would cut down
>> on our false positives (e.g. we'll link to "deadbeef"), and also allow
>> us to be more aggressive in selecting candidate revisions.
>>
>> That may be too expensive to work in practice, or it may
>> not. Investigating that would be a good follow-up to this patch.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  gitweb/gitweb.perl | 18 ++++++++++++++++--
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 92b5e91..7cf68f0 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>>       my $line = shift;
>>
>>       $line = esc_html($line, -nbsp=>1);
>> -     $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>> +     $line =~ s{
>> +        \b
>> +        (
>> +            # The output of "git describe", e.g. v2.10.0-297-gf6727b0
>> +            # or hadoop-20160921-113441-20-g094fb7d
>> +            (?<!-) # see strbuf_check_tag_ref(). Tags can't start with -
>> +            [A-Za-z0-9.-]+
>> +            (?!\.) # refs can't end with ".", see check_refname_format()
>> +            -g[0-9a-fA-F]{7,40}
>> +            |
>> +            # Just a normal looking Git SHA1
>> +            [0-9a-fA-F]{7,40}
>> +        )
>> +        \b
>> +    }{
>>               $cgi->a({-href => href(action=>"object", hash=>$1),
>>                                       -class => "text"}, $1);
>> -     }eg;
>> +     }egx;
>>
>>       return $line;
>>  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] gitweb: Fix a typo in a comment
  2016-10-06  9:11 ` [PATCH v2 1/3] gitweb: Fix a typo in a comment Ævar Arnfjörð Bjarmason
@ 2016-10-14 17:34   ` Jakub Narębski
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Narębski @ 2016-10-14 17:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason napisał:

> Change a typo'd MIME type in a comment. The Content-Type is
> application/xhtml+xml, not application/xhtm+xml.
> 
> Fixes up code originally added in 53c4031 ("gitweb: Strip
> non-printable characters from syntax highlighter output", 2011-09-16).
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Good.  Thanks for taking care of this.
For what is worth for such a trivial patch:

Acked-by: Jakub Narębski <jnareb@gmail.com>

> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 44094f4..cba7405 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1616,7 +1616,7 @@ sub esc_path {
>  	return $str;
>  }
>  
> -# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
> +# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)
>  sub sanitize {
>  	my $str = shift;
>  
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
  2016-10-06  9:11 ` [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+ Ævar Arnfjörð Bjarmason
  2016-10-06 19:44   ` Junio C Hamano
@ 2016-10-14 17:45   ` Jakub Narębski
  2016-10-14 18:40     ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Jakub Narębski @ 2016-10-14 17:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason pisze:

> Change the minimum length of an abbreviated object identifier in the
> commit message gitweb tries to turn into link from 8 hexchars to 7.
> 
> This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb:
> SHA-1 in commit log message links to "object" view", 2006-12-10), but
> the default abbreviation length is 7, and has been for a long time.
> 
> It's still possible to reference SHA1s down to 4 characters in length,
> see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
> git actually produce that, so I doubt anyone is putting that into log
> messages in practice, but people definitely do put 7 character SHA1s
> into log messages.

That's nice explanation why we want to support 7 character SHA-1
(it is the default, and was seen in the wild), but don't need to
support shorter lengths.

Another reason (just for completeness; you don't need to put it in
the commit message) to not go below 7 characters is that with 
decreasing length there is more and more chance for false positives
(like 'beef' or 'caffee' for 4-char or 5-char hexstring), as I wrote
previously.

s/SHA1/SHA-1/g in above paragraph (for correctness and consistency).

> 
> I think it's fairly dubious to link to things matching [0-9a-fA-F]
> here as opposed to just [0-9a-f], that dates back to the initial
> version of gitweb from 161332a ("first working version",
> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
> them as far as I can tell.

All right.  If we decide to be more strict in what we accept, we can
do it in a separate commit.

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Acked-by: Jakub Narębski <jnareb@gmail.com>

> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index cba7405..92b5e91 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>  	my $line = shift;
>  
>  	$line = esc_html($line, -nbsp=>1);
> -	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> +	$line =~ s{\b([0-9a-fA-F]{7,40})\b}{

By the way, it is quite long commit message for one character change.
Not that it is a bad thing...

>  		$cgi->a({-href => href(action=>"object", hash=>$1),
>  					-class => "text"}, $1);
>  	}eg;
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
  2016-10-09 11:20     ` Ævar Arnfjörð Bjarmason
@ 2016-10-14 17:50       ` Ævar Arnfjörð Bjarmason
  2016-10-14 19:00         ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-10-14 17:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git, Jakub Narebski

On Sun, Oct 9, 2016 at 1:20 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Thu, Oct 6, 2016 at 9:51 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> Change the log formatting function to know about "git describe" output
>>> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
>>>
>>> There are still many valid refnames that we don't link to
>>> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
>>> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
>>> similarly it's trivially possible to create some refnames like
>>> "æ/var-gf6727b0" or which won't be picked up by this regex.
>>
>> Not a serious counter-proposal or suggestion, and certainly not an
>> objection to the patch I am responding to, but I wonder if it is so
>> bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08.
>>
>> IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we
>> allowed an optional 'g' in front of the hex, e.g.
>>
>> -       $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>> +       $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{
>>
>> wouldn't that be much simpler, covers more cases and sufficient?
>
> It would be more simpler, maybe that's the right approach. I have a
> preference for making the entire reference a link. I think it makes
> more sense for the UI.

I just ran into an example of a better reason for doing it like my
patch is doing, which is that if you have some tag like:

deployment-20160928-171914-16-g42e13d8

 With my patch the whole thing will be a link to the 42e13d8 commit,
but with this suggestion both 20160928 and 42e13d8 would become commit
links, the former one would be broken.

Of course we could have some code that would detect that the whole \S+
is part of one thing that ends in g<commit>, but the complexity in
doing that would be equivalent or more to doing what my patch is
doing.

>>> There's surely room for improvement here, but I just wanted to address
>>> the very common case of sticking "git describe" output into commit
>>> messages without trying to link to all possible refnames, that's going
>>> to be a rather futile exercise given that this is free text, and it
>>> would be prohibitively expensive to look up whether the references in
>>> question exist in our repository.
>>>
>>> There was on-list discussion about how we could do better than this
>>> patch. Junio suggested to update parse_commits() to call a new
>>> "gitweb--helper" command which would pass each of the revision
>>> candidates through "rev-parse --verify --quiet". That would cut down
>>> on our false positives (e.g. we'll link to "deadbeef"), and also allow
>>> us to be more aggressive in selecting candidate revisions.
>>>
>>> That may be too expensive to work in practice, or it may
>>> not. Investigating that would be a good follow-up to this patch.
>>>
>>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>>> ---
>>>  gitweb/gitweb.perl | 18 ++++++++++++++++--
>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>>> index 92b5e91..7cf68f0 100755
>>> --- a/gitweb/gitweb.perl
>>> +++ b/gitweb/gitweb.perl
>>> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>>>       my $line = shift;
>>>
>>>       $line = esc_html($line, -nbsp=>1);
>>> -     $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>>> +     $line =~ s{
>>> +        \b
>>> +        (
>>> +            # The output of "git describe", e.g. v2.10.0-297-gf6727b0
>>> +            # or hadoop-20160921-113441-20-g094fb7d
>>> +            (?<!-) # see strbuf_check_tag_ref(). Tags can't start with -
>>> +            [A-Za-z0-9.-]+
>>> +            (?!\.) # refs can't end with ".", see check_refname_format()
>>> +            -g[0-9a-fA-F]{7,40}
>>> +            |
>>> +            # Just a normal looking Git SHA1
>>> +            [0-9a-fA-F]{7,40}
>>> +        )
>>> +        \b
>>> +    }{
>>>               $cgi->a({-href => href(action=>"object", hash=>$1),
>>>                                       -class => "text"}, $1);
>>> -     }eg;
>>> +     }egx;
>>>
>>>       return $line;
>>>  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
  2016-10-14 17:45   ` Jakub Narębski
@ 2016-10-14 18:40     ` Junio C Hamano
  2016-10-15  8:11       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-10-14 18:40 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Ævar Arnfjörð Bjarmason, git

Jakub Narębski <jnareb@gmail.com> writes:

> s/SHA1/SHA-1/g in above paragraph (for correctness and consistency).
>> 
>> I think it's fairly dubious to link to things matching [0-9a-fA-F]
>> here as opposed to just [0-9a-f], that dates back to the initial
>> version of gitweb from 161332a ("first working version",
>> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
>> them as far as I can tell.
>
> All right.  If we decide to be more strict in what we accept, we can
> do it in a separate commit.
>
>> 
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>
> Acked-by: Jakub Narębski <jnareb@gmail.com>

Thanks for a review.  As the topic is not yet in 'next', I'll squish
in your Acked-by: to them.  I saw them only for 1 & 2/3; would
another for 3/3 be coming soon?

>
>> ---
>>  gitweb/gitweb.perl | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index cba7405..92b5e91 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>>  	my $line = shift;
>>  
>>  	$line = esc_html($line, -nbsp=>1);
>> -	$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
>> +	$line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>
> By the way, it is quite long commit message for one character change.
> Not that it is a bad thing...
>
>>  		$cgi->a({-href => href(action=>"object", hash=>$1),
>>  					-class => "text"}, $1);
>>  	}eg;
>> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
  2016-10-14 17:50       ` Ævar Arnfjörð Bjarmason
@ 2016-10-14 19:00         ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2016-10-14 19:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git, Jakub Narebski

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I just ran into an example of a better reason for doing it like my
> patch is doing, which is that if you have some tag like:
>
> deployment-20160928-171914-16-g42e13d8
>
> With my patch the whole thing will be a link to the 42e13d8 commit,
> but with this suggestion both 20160928 and 42e13d8 would become commit
> links, the former one would be broken.
>
> Of course we could have some code that would detect that the whole \S+
> is part of one thing that ends in g<commit>,...

I think that this example shows a flaw not in the "suffix that looks
like an object name" approach, but more in the boundary regexp,
namely, the \b part; it is probably too loose.

And \S+ is not the right cue, either, for that matter.  IOW, "we
only should take hexstring, optionally prefixed with 'g', that
appears before the whitespace" is too strict, as a sentence

    We broke the system with deployment-g42e13d8.

does want to link to 42e13d8, even though full-stop at the end is
not whitespace, and the existing regexp uses \b there as a rough
equivalent to saying "Here must be a whitespace or punctuation".

An attempt to tighten "what a punctuation is" by excluding '-' may
make that "timestamp is in the tagname" example work, but is not a
good solution, either, because two sentences can be concatenated
with an em-dash that is often typed as two hyphen in plain text,
resulting in something like

    We broke the system with deployment-g42e13d8--sigh.

and we do want to treat the '-' after 42e13d8 as a punctuation after
a described object name.
    
So I agree 3/3 is good thing to do as-is.

Thanks.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
  2016-10-06  9:11 ` [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages Ævar Arnfjörð Bjarmason
  2016-10-06 19:51   ` Junio C Hamano
@ 2016-10-14 20:06   ` Jakub Narębski
  1 sibling, 0 replies; 16+ messages in thread
From: Jakub Narębski @ 2016-10-14 20:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason pisze:
> Change the log formatting function to know about "git describe" output
> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
> 
> There are still many valid refnames that we don't link to
> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
> similarly it's trivially possible to create some refnames like
> "æ/var-gf6727b0" or which won't be picked up by this regex.

It's important to cover most common cases occurring naturally in
people's repositories, while trying to avoid false positives (the latter
is important more now, where gitweb doesn't check for rev name validity).

I guess that most common is to use non-hierarchical tags with ASCII-only
names for describe output, so while "æ/var-gf6727b0" won't be picked,
it is IMVHO also much less likely to occur.

> 
> There's surely room for improvement here, but I just wanted to address
> the very common case of sticking "git describe" output into commit
> messages without trying to link to all possible refnames, that's going
> to be a rather futile exercise given that this is free text, and it
> would be prohibitively expensive to look up whether the references in
> question exist in our repository.
> 
> There was on-list discussion about how we could do better than this
> patch. Junio suggested to update parse_commits() to call a new
> "gitweb--helper" command which would pass each of the revision
> candidates through "rev-parse --verify --quiet". That would cut down
> on our false positives (e.g. we'll link to "deadbeef"), and also allow
> us to be more aggressive in selecting candidate revisions.
> 
> That may be too expensive to work in practice, or it may
> not. Investigating that would be a good follow-up to this patch.

All right.  That's good and enough for one patch.

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Acked-by: Jakub Narębski <jnareb@gmail.com>

BTW. to add to whole "git describe" output or not discussion: having link
span whole revision marker makes for easier to use UI: larger are to hit.

> ---
>  gitweb/gitweb.perl | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 92b5e91..7cf68f0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>  	my $line = shift;
>  
>  	$line = esc_html($line, -nbsp=>1);
> -	$line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> +	$line =~ s{
> +        \b
> +        (
> +            # The output of "git describe", e.g. v2.10.0-297-gf6727b0
> +            # or hadoop-20160921-113441-20-g094fb7d
> +            (?<!-) # see strbuf_check_tag_ref(). Tags can't start with -
> +            [A-Za-z0-9.-]+
> +            (?!\.) # refs can't end with ".", see check_refname_format()
> +            -g[0-9a-fA-F]{7,40}
> +            |
> +            # Just a normal looking Git SHA1
> +            [0-9a-fA-F]{7,40}
> +        )
> +        \b

Nice using of eXplained regexp.  It is much easier to understand with
comments.

> +    }{
>  		$cgi->a({-href => href(action=>"object", hash=>$1),
>  					-class => "text"}, $1);
> -	}eg;
> +	}egx;
>  
>  	return $line;
>  }
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
  2016-10-14 18:40     ` Junio C Hamano
@ 2016-10-15  8:11       ` Ævar Arnfjörð Bjarmason
  2016-10-17 16:54         ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-10-15  8:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narębski, Git Mailing List

On Fri, Oct 14, 2016 at 8:40 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jakub Narębski <jnareb@gmail.com> writes:
>
>> s/SHA1/SHA-1/g in above paragraph (for correctness and consistency).
>>>
>>> I think it's fairly dubious to link to things matching [0-9a-fA-F]
>>> here as opposed to just [0-9a-f], that dates back to the initial
>>> version of gitweb from 161332a ("first working version",
>>> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
>>> them as far as I can tell.
>>
>> All right.  If we decide to be more strict in what we accept, we can
>> do it in a separate commit.
>>
>>>
>>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>>
>> Acked-by: Jakub Narębski <jnareb@gmail.com>
>
> Thanks for a review.  As the topic is not yet in 'next', I'll squish
> in your Acked-by: to them.  I saw them only for 1 & 2/3; would
> another for 3/3 be coming soon?

As far as I can tell the only outstanding "change this" is your
s/SHA1/SHA-1/ in <xmqq37k9jm86.fsf@gitster.mtv.corp.google.com>, do
you want to fix that up or should I submit another series?

>>
>>> ---
>>>  gitweb/gitweb.perl | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>>> index cba7405..92b5e91 100755
>>> --- a/gitweb/gitweb.perl
>>> +++ b/gitweb/gitweb.perl
>>> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>>>      my $line = shift;
>>>
>>>      $line = esc_html($line, -nbsp=>1);
>>> -    $line =~ s{\b([0-9a-fA-F]{8,40})\b}{
>>> +    $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>>
>> By the way, it is quite long commit message for one character change.
>> Not that it is a bad thing...
>>
>>>              $cgi->a({-href => href(action=>"object", hash=>$1),
>>>                                      -class => "text"}, $1);
>>>      }eg;
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
  2016-10-15  8:11       ` Ævar Arnfjörð Bjarmason
@ 2016-10-17 16:54         ` Junio C Hamano
  2016-10-17 19:46           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-10-17 16:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jakub Narębski, Git Mailing List

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> As far as I can tell the only outstanding "change this" is your
> s/SHA1/SHA-1/ in <xmqq37k9jm86.fsf@gitster.mtv.corp.google.com>, do
> you want to fix that up or should I submit another series?

I think I did that already myself while queuing.  Could you fetch
what I queued on 'pu' to double check?

I think the diff between what was posted and what is queued (I just
checked) looks like this:

-gitweb: Link to 7-char+ SHA1s, not only 8-char+
+gitweb: link to 7-char+ SHA-1s, not only 8-char+

 Change the minimum length of an abbreviated object identifier in the
 commit message gitweb tries to turn into link from 8 hexchars to 7.
 
@@ -5,16 +12,18 @@
 SHA-1 in commit log message links to "object" view", 2006-12-10), but
 the default abbreviation length is 7, and has been for a long time.
 
-It's still possible to reference SHA1s down to 4 characters in length,
+It's still possible to reference SHA-1s down to 4 characters in length,
 see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
 git actually produce that, so I doubt anyone is putting that into log
-messages in practice, but people definitely do put 7 character SHA1s
+messages in practice, but people definitely do put 7 character SHA-1s
 into log messages.
 
 I think it's fairly dubious to link to things matching [0-9a-fA-F]
 here as opposed to just [0-9a-f], that dates back to the initial
 version of gitweb from 161332a ("first working version",
-2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
+2005-08-07). Git will accept all-caps SHA-1s, but didn't ever produce
 them as far as I can tell.
 
 Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+Acked-by: Jakub Narębski <jnareb@gmail.com>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
  2016-10-17 16:54         ` Junio C Hamano
@ 2016-10-17 19:46           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-10-17 19:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narębski, Git Mailing List

On Mon, Oct 17, 2016 at 6:54 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> As far as I can tell the only outstanding "change this" is your
>> s/SHA1/SHA-1/ in <xmqq37k9jm86.fsf@gitster.mtv.corp.google.com>, do
>> you want to fix that up or should I submit another series?
>
> I think I did that already myself while queuing.  Could you fetch
> what I queued on 'pu' to double check?

Thanks, looked at it, looks good to me!
> I think the diff between what was posted and what is queued (I just
> checked) looks like this:
>
> -gitweb: Link to 7-char+ SHA1s, not only 8-char+
> +gitweb: link to 7-char+ SHA-1s, not only 8-char+
>
>  Change the minimum length of an abbreviated object identifier in the
>  commit message gitweb tries to turn into link from 8 hexchars to 7.
>
> @@ -5,16 +12,18 @@
>  SHA-1 in commit log message links to "object" view", 2006-12-10), but
>  the default abbreviation length is 7, and has been for a long time.
>
> -It's still possible to reference SHA1s down to 4 characters in length,
> +It's still possible to reference SHA-1s down to 4 characters in length,
>  see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
>  git actually produce that, so I doubt anyone is putting that into log
> -messages in practice, but people definitely do put 7 character SHA1s
> +messages in practice, but people definitely do put 7 character SHA-1s
>  into log messages.
>
>  I think it's fairly dubious to link to things matching [0-9a-fA-F]
>  here as opposed to just [0-9a-f], that dates back to the initial
>  version of gitweb from 161332a ("first working version",
> -2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
> +2005-08-07). Git will accept all-caps SHA-1s, but didn't ever produce
>  them as far as I can tell.
>
>  Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> +Acked-by: Jakub Narębski <jnareb@gmail.com>
> +Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-10-17 19:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-06  9:11 [PATCH v2 0/3] gitweb: Be smarter about linking to SHA1s in log messages Ævar Arnfjörð Bjarmason
2016-10-06  9:11 ` [PATCH v2 1/3] gitweb: Fix a typo in a comment Ævar Arnfjörð Bjarmason
2016-10-14 17:34   ` Jakub Narębski
2016-10-06  9:11 ` [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+ Ævar Arnfjörð Bjarmason
2016-10-06 19:44   ` Junio C Hamano
2016-10-14 17:45   ` Jakub Narębski
2016-10-14 18:40     ` Junio C Hamano
2016-10-15  8:11       ` Ævar Arnfjörð Bjarmason
2016-10-17 16:54         ` Junio C Hamano
2016-10-17 19:46           ` Ævar Arnfjörð Bjarmason
2016-10-06  9:11 ` [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages Ævar Arnfjörð Bjarmason
2016-10-06 19:51   ` Junio C Hamano
2016-10-09 11:20     ` Ævar Arnfjörð Bjarmason
2016-10-14 17:50       ` Ævar Arnfjörð Bjarmason
2016-10-14 19:00         ` Junio C Hamano
2016-10-14 20:06   ` Jakub Narębski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).