git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] userdiff: add built-in pattern for CSS
@ 2016-05-20 13:28 William Duclot
  2016-05-20 22:37 ` Junio C Hamano
  2016-05-24 14:25 ` William Duclot
  0 siblings, 2 replies; 21+ messages in thread
From: William Duclot @ 2016-05-20 13:28 UTC (permalink / raw)
  To: git; +Cc: simon.rabourg, francois.beutin, antoine.queru, Matthieu Moy

CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
---
 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh        |  1 +
 t/t4018/css-rule                |  4 ++++
 t/t4034-diff-words.sh           |  1 +
 t/t4034/css/expect              | 16 ++++++++++++++++
 t/t4034/css/post                |  9 +++++++++
 t/t4034/css/pre                 |  9 +++++++++
 userdiff.c                      |  8 ++++++++
 8 files changed, 50 insertions(+)
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
 	bibtex
 	cpp
 	csharp
+	css
 	fortran
 	fountain
 	html
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 0000000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect b/t/t4034/css/expect
new file mode 100644
index 0000000..b025d88
--- /dev/null
+++ b/t/t4034/css/expect
@@ -0,0 +1,16 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 735f301..bdf6a90 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,9 +1,9 @@<RESET>
+.<RED>class-form<RESET><GREEN>other-form<RESET> label.control-label {
+    margin-top: <RED>10<RESET><GREEN>15<RESET>px!important;
+    border : 10px <RED>dashed<RESET><GREEN>dotted<RESET> #C6C6C6;
+}<RESET>
+<RED>#CCCCCC<RESET>
+<RED>padding-bottom<RESET><GREEN>#CCCCCB<RESET>
+<GREEN>margin-left<RESET>
+150<RED>px<RESET><GREEN>em<RESET>
+10px
+<RED>!important<RESET>
+<RED>div<RESET><GREEN>li<RESET>.class#id
diff --git a/t/t4034/css/post b/t/t4034/css/post
new file mode 100644
index 0000000..bdf6a90
--- /dev/null
+++ b/t/t4034/css/post
@@ -0,0 +1,9 @@
+.other-form label.control-label {
+    margin-top: 15px!important;
+    border : 10px dotted #C6C6C6;
+}
+#CCCCCB
+margin-left
+150em
+10px
+li.class#id
diff --git a/t/t4034/css/pre b/t/t4034/css/pre
new file mode 100644
index 0000000..735f301
--- /dev/null
+++ b/t/t4034/css/pre
@@ -0,0 +1,9 @@
+.class-form label.control-label {
+    margin-top: 10px!important;
+    border : 10px dashed #C6C6C6;
+}
+#CCCCCC
+padding-bottom
+150px
+10px!important
+div.class#id
diff --git a/userdiff.c b/userdiff.c
index 6bf2505..0f9cfbe 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -148,6 +148,14 @@ PATTERNS("csharp",
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+PATTERNS("css",
+	 "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
+	 /* -- */
+	 /* This regex comes from W3C CSS specs. Should theoretically also allow ISO 10646 characters U+00A0 and higher,
+	  * this not handled in this regex. */
+	 "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
+	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
+),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
-- 
2.8.2.403.gdf0b511.dirty

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-05-20 13:28 [PATCH] userdiff: add built-in pattern for CSS William Duclot
@ 2016-05-20 22:37 ` Junio C Hamano
  2016-05-24 14:25 ` William Duclot
  1 sibling, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2016-05-20 22:37 UTC (permalink / raw)
  To: William Duclot
  Cc: git, simon.rabourg, francois.beutin, antoine.queru, Matthieu Moy

William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:

> CSS is widely used, motivating it being included as a built-in pattern.
>
> It must be noted that the word_regex for CSS (i.e. the regex defining
> what is a word in the language) does not consider '.' and '#' characters
> (in CSS selectors) to be part of the word. This behavior is documented
> by the test t/t4018/css-rule.
> The logic behind this behavior is the following: identifiers in CSS
> selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
> character are not part of the identifier, but an indicator of the nature
> of the identifier in HTML/XML (class or id). Diffing ".class1" and
> ".class2" must show that the class name is changed, but we still are
> selecting a class.

In other words, if "div#foo" changed to "span#bar", word-diff would
say that "div changed to span, # didn't change and foo changed to
bar".

Which makes sense to me.

The above is not a request to change anything; just me thinking
aloud to see if I agree with the reasoning.

> diff --git a/userdiff.c b/userdiff.c
> index 6bf2505..0f9cfbe 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -148,6 +148,14 @@ PATTERNS("csharp",
>  	 "[a-zA-Z_][a-zA-Z0-9_]*"
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
> +PATTERNS("css",
> +	 "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
> +	 /* -- */
> +	 /* This regex comes from W3C CSS specs. Should theoretically also allow ISO 10646 characters U+00A0 and higher,
> +	  * this not handled in this regex. */
> +	 "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
> +	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
> +),

Style:

	/*
         * This regex comes from ...
         * ...
         * but they are not handled with this regex.
         */

I wonder if IPATTERN() may make it easier to express the above.

Also, a-zA-F (twice seen in "identifiers" section) looks somewhat
suspicious.  a-fA-F or a-zA-Z I would understand, and I suspect this
is a misspelled form of the latter.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] userdiff: add built-in pattern for CSS
  2016-05-20 13:28 [PATCH] userdiff: add built-in pattern for CSS William Duclot
  2016-05-20 22:37 ` Junio C Hamano
@ 2016-05-24 14:25 ` William Duclot
  2016-05-24 19:06   ` Junio C Hamano
                     ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: William Duclot @ 2016-05-24 14:25 UTC (permalink / raw)
  To: git; +Cc: simon.rabourg, francois.beutin, antoine.queru, gitster,
	Matthieu Moy

CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
---
changes since v1:
Fix a typo in the word_regex ("A-F" => "A-Z").
Clearer comment about ISO 10646 characters.

 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh        |  1 +
 t/t4018/css-rule                |  4 ++++
 t/t4034-diff-words.sh           |  1 +
 t/t4034/css/expect              | 16 ++++++++++++++++
 t/t4034/css/post                | 10 ++++++++++
 t/t4034/css/pre                 | 10 ++++++++++
 userdiff.c                      |  8 ++++++++
 8 files changed, 52 insertions(+)
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
 	bibtex
 	cpp
 	csharp
+	css
 	fortran
 	fountain
 	html
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 0000000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect b/t/t4034/css/expect
new file mode 100644
index 0000000..ed10393
--- /dev/null
+++ b/t/t4034/css/expect
@@ -0,0 +1,16 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index b8ae0bb..fe500b7 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,10 +1,10 @@<RESET>
+.<RED>class-form<RESET><GREEN>other-form<RESET> label.control-label {
+    margin-top: <RED>10<RESET><GREEN>15<RESET>px!important;
+    border : 10px <RED>dashed<RESET><GREEN>dotted<RESET> #C6C6C6;
+}<RESET>
+<RED>#CCCCCC<RESET><GREEN>#CCCCCB<RESET>
+10em<RESET>
+<RED>padding-bottom<RESET><GREEN>margin-left<RESET>
+150<RED>px<RESET><GREEN>em<RESET>
+10px
+<RED>!important<RESET>
+<RED>div<RESET><GREEN>li<RESET>.class#id
diff --git a/t/t4034/css/post b/t/t4034/css/post
new file mode 100644
index 0000000..fe500b7
--- /dev/null
+++ b/t/t4034/css/post
@@ -0,0 +1,10 @@
+.other-form label.control-label {
+    margin-top: 15px!important;
+    border : 10px dotted #C6C6C6;
+}
+#CCCCCB
+10em
+margin-left
+150em
+10px
+li.class#id
diff --git a/t/t4034/css/pre b/t/t4034/css/pre
new file mode 100644
index 0000000..b8ae0bb
--- /dev/null
+++ b/t/t4034/css/pre
@@ -0,0 +1,10 @@
+.class-form label.control-label {
+    margin-top: 10px!important;
+    border : 10px dashed #C6C6C6;
+}
+#CCCCCC
+10em
+padding-bottom
+150px
+10px!important
+div.class#id
diff --git a/userdiff.c b/userdiff.c
index 6bf2505..9273969 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -148,6 +148,14 @@ PATTERNS("csharp",
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+PATTERNS("css",
+	 "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
+	 /* -- */
+	 /* This regex comes from W3C CSS specs. Should theoretically also allow ISO 10646 characters U+00A0 and higher,
+	  * but they are not handled with this regex. */
+	 "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
+	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
+),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
-- 
2.9.0.rc0.1.g521d471.dirty

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-05-24 14:25 ` William Duclot
@ 2016-05-24 19:06   ` Junio C Hamano
  2016-05-24 22:12     ` William Duclot
  2016-05-26 21:11   ` Johannes Sixt
  2016-06-02 22:48   ` William Duclot
  2 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2016-05-24 19:06 UTC (permalink / raw)
  To: William Duclot
  Cc: git, simon.rabourg, francois.beutin, antoine.queru, Matthieu Moy

William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:

> CSS is widely used, motivating it being included as a built-in pattern.
>
> It must be noted that the word_regex for CSS (i.e. the regex defining
> what is a word in the language) does not consider '.' and '#' characters
> (in CSS selectors) to be part of the word. This behavior is documented
> by the test t/t4018/css-rule.
> The logic behind this behavior is the following: identifiers in CSS
> selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
> character are not part of the identifier, but an indicator of the nature
> of the identifier in HTML/XML (class or id). Diffing ".class1" and
> ".class2" must show that the class name is changed, but we still are
> selecting a class.
>
> Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
> ---
> changes since v1:
> Fix a typo in the word_regex ("A-F" => "A-Z").
> Clearer comment about ISO 10646 characters.

It is not a big deal for a small single-patch topic like this, but
it often is hard to reviewers if you do not respond to comments you
received and instead just send a new version of the patch with
"changes since..." comment.  Please make it a habit to do both.

I can see in the above "changes since v1" comment, you took the
A-F/A-Z thing, but I cannot tell if you thought about PATTERNS vs
IPATTERN and rejected IPATTERN with a good reason or if you simply
missed it when reading the review comments you received, for
example.

Three remaining issues are:

 - Have you considered using IPATTERN()?  PATTERNS() that defaults
   case sensitive match is suitable for real languages with fixed
   case keywords, but the pattern you are defining for CSS does not
   do anything special for any set of fixed-case built-in keywords,
   and appears to be better served by IPATTERN().

 - In our codebase, we format multi-line comments in a particular
   way, namely

	/*
         * A multi-line comment begins with slash asterisk
         * on its own line, and its closing asterisk slash
         * also is on its own line.
         */

 - Try not to write overlong lines.  If your expression needs to
   become long and there is no good place to fold lines, that is one
   thing, but an overlong comment is unexcuable, as you can fold
   lines anywhere between words.

Thanks.

> diff --git a/userdiff.c b/userdiff.c
> index 6bf2505..9273969 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -148,6 +148,14 @@ PATTERNS("csharp",
>  	 "[a-zA-Z_][a-zA-Z0-9_]*"
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
> +PATTERNS("css",
> +	 "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
> +	 /* -- */
> +	 /* This regex comes from W3C CSS specs. Should theoretically also allow ISO 10646 characters U+00A0 and higher,
> +	  * but they are not handled with this regex. */
> +	 "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
> +	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
> +),
>  { "default", NULL, -1, { NULL, 0 } },
>  };
>  #undef PATTERNS

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-05-24 19:06   ` Junio C Hamano
@ 2016-05-24 22:12     ` William Duclot
  2016-05-24 22:18       ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: William Duclot @ 2016-05-24 22:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, simon rabourg, francois beutin, antoine queru, Matthieu Moy

> It is not a big deal for a small single-patch topic like this, but
> it often is hard to reviewers if you do not respond to comments you
> received and instead just send a new version of the patch with
> "changes since..." comment.  Please make it a habit to do both.

Apologies, I am not quite used to work with a mailing list yet, I
will make sure to respect this in the future!
 
>  - Have you considered using IPATTERN()?  PATTERNS() that defaults
>    case sensitive match is suitable for real languages with fixed
>    case keywords, but the pattern you are defining for CSS does not
>    do anything special for any set of fixed-case built-in keywords,
>    and appears to be better served by IPATTERN().

I did have considered IPATTERN(), but assumed that case-sensitive was
default and case-insensitive was the exception. As the CSS pattern
does not deal with letters at all it seemed sensible to me to follow
the example of the HTML pattern, which use PATTERNS().

>  - In our codebase, we format multi-line comments in a particular
>    way, namely
> 
> 	/*
>          * A multi-line comment begins with slash asterisk
>          * on its own line, and its closing asterisk slash
>          * also is on its own line.
>          */

I take good note of that. I took example on the fortran pattern
comment, should I fix it too while I'm at it?
 
>  - Try not to write overlong lines.  If your expression needs to
>    become long and there is no good place to fold lines, that is one
>    thing, but an overlong comment is unexcuable, as you can fold
>    lines anywhere between words.

Again, I take good note of that.

Thank you for your time!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-05-24 22:12     ` William Duclot
@ 2016-05-24 22:18       ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2016-05-24 22:18 UTC (permalink / raw)
  To: William Duclot
  Cc: git, simon rabourg, francois beutin, antoine queru, Matthieu Moy

William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:

> I did have considered IPATTERN(), but assumed that case-sensitive was
> default and case-insensitive was the exception.

Don't assume, but ask ;-)

> As the CSS pattern
> does not deal with letters at all it seemed sensible to me to follow
> the example of the HTML pattern, which use PATTERNS().

Did you notice that HTML pattern has to do an [Hh] that would be
unnecessary if it chose to use IPATTTERN()?

You do not have to ask a person, but instead ask the history.
IPATTERN() was added at 909a5494 (userdiff.c: add builtin fortran
regex patterns, 2010-09-10) when adding fortran support.  Anything
that existed before, including HTML, did [A-Za-z] when they could
have done [a-z] if IPATTERN() existed back then.

>>  - In our codebase, we format multi-line comments in a particular
>>    way, namely
>> 
>> 	   /*
>>          * A multi-line comment begins with slash asterisk
>>          * on its own line, and its closing asterisk slash
>>          * also is on its own line.
>>          */
>
> I take good note of that. I took example on the fortran pattern
> comment, should I fix it too while I'm at it?

Not "while you are at it".

Making existing things better is welcome but such a change shouldn't
be mixed with addition of new things.  You can do it as a separate
patch, probably as a preliminary clean-up before your change, if you
want to.

>>  - Try not to write overlong lines.  If your expression needs to
>>    become long and there is no good place to fold lines, that is one
>>    thing, but an overlong comment is unexcuable, as you can fold
>>    lines anywhere between words.
>
> Again, I take good note of that.
>
> Thank you for your time!

Thank you for working on this.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-05-24 14:25 ` William Duclot
  2016-05-24 19:06   ` Junio C Hamano
@ 2016-05-26 21:11   ` Johannes Sixt
  2016-05-27  7:48     ` William Duclot
  2016-06-02 22:48   ` William Duclot
  2 siblings, 1 reply; 21+ messages in thread
From: Johannes Sixt @ 2016-05-26 21:11 UTC (permalink / raw)
  To: William Duclot
  Cc: git, simon.rabourg, francois.beutin, antoine.queru, gitster,
	Matthieu Moy

Am 24.05.2016 um 16:25 schrieb William Duclot:
> +PATTERNS("css",
> +	 "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",

This hunk header pattern is a bit too restrictive for my taste. Find
below a few more test cases that you should squash in. One case fails
because only the first CSS selector is picked up, for which I do not
see a reason.

Another case fails because the opening brace is not on the line with
the CSS selectors.

I think what the hunk header pattern should do is:

1. reject lines containing a colon (because that are properties)
2. if a line begins with a name in column 1, pick the whole line

See the cpp patterns: a pattern beginning with ! is a "reject" pattern.


diff --git a/t/t4018/css-brace-in-col-1 b/t/t4018/css-brace-in-col-1
new file mode 100644
index 0000000..7831577
--- /dev/null
+++ b/t/t4018/css-brace-in-col-1
@@ -0,0 +1,5 @@
+RIGHT label.control-label
+{
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-rule b/t/t4018/css-common
similarity index 100%
rename from t/t4018/css-rule
rename to t/t4018/css-common
diff --git a/t/t4018/css-long-selector-list b/t/t4018/css-long-selector-list
new file mode 100644
index 0000000..7ccd25d
--- /dev/null
+++ b/t/t4018/css-long-selector-list
@@ -0,0 +1,6 @@
+p.header,
+label.control-label,
+div ul#RIGHT {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-prop-sans-indent b/t/t4018/css-prop-sans-indent
new file mode 100644
index 0000000..a9e3c86
--- /dev/null
+++ b/t/t4018/css-prop-sans-indent
@@ -0,0 +1,5 @@
+RIGHT, label.control-label {
+margin-top: 10px!important;
+padding: 0;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-short-selector-list b/t/t4018/css-short-selector-list
new file mode 100644
index 0000000..6a0bdee
--- /dev/null
+++ b/t/t4018/css-short-selector-list
@@ -0,0 +1,4 @@
+label.control, div ul#RIGHT {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
-- 
2.9.0.rc0.40.gb3c1388

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-05-26 21:11   ` Johannes Sixt
@ 2016-05-27  7:48     ` William Duclot
  0 siblings, 0 replies; 21+ messages in thread
From: William Duclot @ 2016-05-27  7:48 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: git, simon rabourg, francois beutin, antoine queru, gitster,
	Matthieu Moy

Junio C Hamano <gitster@pobox.com> writes:
> William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:
>
>> As the CSS pattern
>> does not deal with letters at all it seemed sensible to me to follow
>> the example of the HTML pattern, which use PATTERNS().
> 
> Did you notice that HTML pattern has to do an [Hh] that would be
> unnecessary if it chose to use IPATTTERN()?
> 
> You do not have to ask a person, but instead ask the history.
> IPATTERN() was added at 909a5494 (userdiff.c: add builtin fortran
> regex patterns, 2010-09-10) when adding fortran support.  Anything
> that existed before, including HTML, did [A-Za-z] when they could
> have done [a-z] if IPATTERN() existed back then.

I hadn't noticed that the HTML pattern was older, indeed

>>>  - In our codebase, we format multi-line comments in a particular
>>>    way, namely
>>> 
>>> 	   /*
>>>          * A multi-line comment begins with slash asterisk
>>>          * on its own line, and its closing asterisk slash
>>>          * also is on its own line.
>>>          */
>>
>> I take good note of that. I took example on the fortran pattern
>> comment, should I fix it too while I'm at it?
> 
> Not "while you are at it".
> 
> Making existing things better is welcome but such a change shouldn't
> be mixed with addition of new things.  You can do it as a separate
> patch, probably as a preliminary clean-up before your change, if you
> want to.

OK !


Johannes Sixt <j6t@kdbg.org> writes:
> Am 24.05.2016 um 16:25 schrieb William Duclot:
>> +PATTERNS("css",
>> +	 "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
> 
> This hunk header pattern is a bit too restrictive for my taste. Find
> below a few more test cases that you should squash in. One case fails
> because only the first CSS selector is picked up, for which I do not
> see a reason.
> 
> Another case fails because the opening brace is not on the line with
> the CSS selectors.

Yes, it seems you're right !

> I think what the hunk header pattern should do is:
> 
> 1. reject lines containing a colon (because that are properties)
> 2. if a line begins with a name in column 1, pick the whole line
> 
> See the cpp patterns: a pattern beginning with ! is a "reject" pattern.

That may be a good idea, I will look into that

> diff --git a/t/t4018/css-brace-in-col-1 b/t/t4018/css-brace-in-col-1
> new file mode 100644
> index 0000000..7831577
> --- /dev/null
> +++ b/t/t4018/css-brace-in-col-1
> @@ -0,0 +1,5 @@
> +RIGHT label.control-label
> +{
> +    margin-top: 10px!important;
> +    border : 10px ChangeMe #C6C6C6;
> +}
> diff --git a/t/t4018/css-rule b/t/t4018/css-common
> similarity index 100%
> rename from t/t4018/css-rule
> rename to t/t4018/css-common
> diff --git a/t/t4018/css-long-selector-list b/t/t4018/css-long-selector-list
> new file mode 100644
> index 0000000..7ccd25d
> --- /dev/null
> +++ b/t/t4018/css-long-selector-list
> @@ -0,0 +1,6 @@
> +p.header,
> +label.control-label,
> +div ul#RIGHT {
> +    margin-top: 10px!important;
> +    border : 10px ChangeMe #C6C6C6;
> +}
> diff --git a/t/t4018/css-prop-sans-indent b/t/t4018/css-prop-sans-indent
> new file mode 100644
> index 0000000..a9e3c86
> --- /dev/null
> +++ b/t/t4018/css-prop-sans-indent
> @@ -0,0 +1,5 @@
> +RIGHT, label.control-label {
> +margin-top: 10px!important;
> +padding: 0;
> +border : 10px ChangeMe #C6C6C6;
> +}
> diff --git a/t/t4018/css-short-selector-list
> b/t/t4018/css-short-selector-list
> new file mode 100644
> index 0000000..6a0bdee
> --- /dev/null
> +++ b/t/t4018/css-short-selector-list
> @@ -0,0 +1,4 @@
> +label.control, div ul#RIGHT {
> +    margin-top: 10px!important;
> +    border : 10px ChangeMe #C6C6C6;
> +}
> --
> 2.9.0.rc0.40.gb3c1388

Thanks for the test cases, I'll look into that as soon as I have time

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] userdiff: add built-in pattern for CSS
  2016-05-24 14:25 ` William Duclot
  2016-05-24 19:06   ` Junio C Hamano
  2016-05-26 21:11   ` Johannes Sixt
@ 2016-06-02 22:48   ` William Duclot
  2016-06-02 23:07     ` Junio C Hamano
                       ` (2 more replies)
  2 siblings, 3 replies; 21+ messages in thread
From: William Duclot @ 2016-06-02 22:48 UTC (permalink / raw)
  To: git
  Cc: simon.rabourg, antoine.queru, francois.beutin, j6t, gitster,
	Matthieu Moy

CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Logic behind the "pattern" regex is:
    1. reject lines containing a colon (properties)
    2. if a line begins with a name in column 1, pick the whole line

Credits to Johannes Sixt (j6t@kdbg.org) for the pattern regex and most
of the tests.

Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
---
Changes since V2:
    - pattern regex has changed
    - more tests

 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh        |  1 +
 t/t4018/css-brace-in-col-1      |  5 +++++
 t/t4018/css-common              |  4 ++++
 t/t4018/css-long-selector-list  |  6 ++++++
 t/t4018/css-prop-sans-indent    |  5 +++++
 t/t4018/css-rule                |  4 ++++
 t/t4018/css-short-selector-list |  4 ++++
 t/t4034-diff-words.sh           |  1 +
 t/t4034/css/expect              | 16 ++++++++++++++++
 t/t4034/css/post                | 10 ++++++++++
 t/t4034/css/pre                 | 10 ++++++++++
 userdiff.c                      | 12 ++++++++++++
 13 files changed, 80 insertions(+)
 create mode 100644 t/t4018/css-brace-in-col-1
 create mode 100644 t/t4018/css-common
 create mode 100644 t/t4018/css-long-selector-list
 create mode 100644 t/t4018/css-prop-sans-indent
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4018/css-short-selector-list
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
 	bibtex
 	cpp
 	csharp
+	css
 	fortran
 	fountain
 	html
diff --git a/t/t4018/css-brace-in-col-1 b/t/t4018/css-brace-in-col-1
new file mode 100644
index 0000000..7831577
--- /dev/null
+++ b/t/t4018/css-brace-in-col-1
@@ -0,0 +1,5 @@
+RIGHT label.control-label
+{
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-common b/t/t4018/css-common
new file mode 100644
index 0000000..84ed754
--- /dev/null
+++ b/t/t4018/css-common
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-long-selector-list b/t/t4018/css-long-selector-list
new file mode 100644
index 0000000..7ccd25d
--- /dev/null
+++ b/t/t4018/css-long-selector-list
@@ -0,0 +1,6 @@
+p.header,
+label.control-label,
+div ul#RIGHT {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-prop-sans-indent b/t/t4018/css-prop-sans-indent
new file mode 100644
index 0000000..a9e3c86
--- /dev/null
+++ b/t/t4018/css-prop-sans-indent
@@ -0,0 +1,5 @@
+RIGHT, label.control-label {
+margin-top: 10px!important;
+padding: 0;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 0000000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-short-selector-list b/t/t4018/css-short-selector-list
new file mode 100644
index 0000000..6a0bdee
--- /dev/null
+++ b/t/t4018/css-short-selector-list
@@ -0,0 +1,4 @@
+label.control, div ul#RIGHT {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect b/t/t4034/css/expect
new file mode 100644
index 0000000..ed10393
--- /dev/null
+++ b/t/t4034/css/expect
@@ -0,0 +1,16 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index b8ae0bb..fe500b7 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,10 +1,10 @@<RESET>
+.<RED>class-form<RESET><GREEN>other-form<RESET> label.control-label {
+    margin-top: <RED>10<RESET><GREEN>15<RESET>px!important;
+    border : 10px <RED>dashed<RESET><GREEN>dotted<RESET> #C6C6C6;
+}<RESET>
+<RED>#CCCCCC<RESET><GREEN>#CCCCCB<RESET>
+10em<RESET>
+<RED>padding-bottom<RESET><GREEN>margin-left<RESET>
+150<RED>px<RESET><GREEN>em<RESET>
+10px
+<RED>!important<RESET>
+<RED>div<RESET><GREEN>li<RESET>.class#id
diff --git a/t/t4034/css/post b/t/t4034/css/post
new file mode 100644
index 0000000..fe500b7
--- /dev/null
+++ b/t/t4034/css/post
@@ -0,0 +1,10 @@
+.other-form label.control-label {
+    margin-top: 15px!important;
+    border : 10px dotted #C6C6C6;
+}
+#CCCCCB
+10em
+margin-left
+150em
+10px
+li.class#id
diff --git a/t/t4034/css/pre b/t/t4034/css/pre
new file mode 100644
index 0000000..b8ae0bb
--- /dev/null
+++ b/t/t4034/css/pre
@@ -0,0 +1,10 @@
+.class-form label.control-label {
+    margin-top: 10px!important;
+    border : 10px dashed #C6C6C6;
+}
+#CCCCCC
+10em
+padding-bottom
+150px
+10px!important
+div.class#id
diff --git a/userdiff.c b/userdiff.c
index 6bf2505..00fc3bf 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -148,6 +148,18 @@ PATTERNS("csharp",
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+IPATTERN("css",
+	 "!^.*;\n"
+	 "^[_a-z0-9].*$",
+	 /* -- */
+	 /*
+	  * This regex comes from W3C CSS specs. Should theoretically also
+	  * allow ISO 10646 characters U+00A0 and higher,
+	  * but they are not handled in this regex.
+	  */
+	 "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
+	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
+),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
-- 
2.8.2.403.ge2646ba.dirty

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-02 22:48   ` William Duclot
@ 2016-06-02 23:07     ` Junio C Hamano
  2016-06-03  5:52     ` Johannes Sixt
  2016-06-03 12:32     ` William Duclot
  2 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2016-06-02 23:07 UTC (permalink / raw)
  To: William Duclot
  Cc: git, simon.rabourg, antoine.queru, francois.beutin, j6t,
	Matthieu Moy

William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:

> diff --git a/userdiff.c b/userdiff.c
> index 6bf2505..00fc3bf 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -148,6 +148,18 @@ PATTERNS("csharp",
>  	 "[a-zA-Z_][a-zA-Z0-9_]*"
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
> +IPATTERN("css",
> +	 "!^.*;\n"
> +	 "^[_a-z0-9].*$",
> +	 /* -- */
> +	 /*
> +	  * This regex comes from W3C CSS specs. Should theoretically also
> +	  * allow ISO 10646 characters U+00A0 and higher,
> +	  * but they are not handled in this regex.
> +	  */
> +	 "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
> +	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */

You could now lose A-F from the above two lines.  In fact, the first
line "identifiers" has "a-zA-F", which probably would work correctly
under IPATTERN() to include 'G-Z' as part of the legit letters for
identifiers, but is indeed misleading.

> +),
>  { "default", NULL, -1, { NULL, 0 } },
>  };
>  #undef PATTERNS

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-02 22:48   ` William Duclot
  2016-06-02 23:07     ` Junio C Hamano
@ 2016-06-03  5:52     ` Johannes Sixt
  2016-06-03  6:41       ` Matthieu Moy
  2016-06-03  9:45       ` William Duclot
  2016-06-03 12:32     ` William Duclot
  2 siblings, 2 replies; 21+ messages in thread
From: Johannes Sixt @ 2016-06-03  5:52 UTC (permalink / raw)
  To: William Duclot
  Cc: git, simon.rabourg, antoine.queru, francois.beutin, gitster,
	Matthieu Moy

Am 03.06.2016 um 00:48 schrieb William Duclot:
> Logic behind the "pattern" regex is:

The name of the macro parameter is "pattern", but the actual meaning is 
"function name" regex.

> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index e3b1de8..81f60ad 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -525,6 +525,8 @@ patterns are available:
>
>   - `csharp` suitable for source code in the C# language.
>
> +- `css` suitable for source code in the CSS language.

CSS is not so much source code. How about "suitable for cascaded style 
sheets"?

> diff --git a/t/t4018/css-common b/t/t4018/css-common
> new file mode 100644
> index 0000000..84ed754
> --- /dev/null
> +++ b/t/t4018/css-common
> @@ -0,0 +1,4 @@
> +RIGHT label.control-label {
> +    margin-top: 10px!important;
> +    border : 10px ChangeMe #C6C6C6;
> +}

> diff --git a/t/t4018/css-rule b/t/t4018/css-rule
> new file mode 100644
> index 0000000..84ed754
> --- /dev/null
> +++ b/t/t4018/css-rule
> @@ -0,0 +1,4 @@
> +RIGHT label.control-label {
> +    margin-top: 10px!important;
> +    border : 10px ChangeMe #C6C6C6;
> +}

These two are the same. Please pick only one. I propose the name 
"common" because it is how CSS rules are written most commonly.

> +IPATTERN("css",
> +	 "!^.*;\n"

Is there a difference between this and "!;\n"? Is it necessary to anchor 
the pattern at the beginning of the line?

In the commit message you talk about colon (':'), but you actually use a 
semicolon (';'). Thinking a bit more about it, rejecting lines with 
either one would be even better. Consider this case (without the 
indentation):

    h1 {
    color:
    red;
    }

(New test case, hint, hint!) Therefore, it could be: "![:;]\n".

> +	 "^[_a-z0-9].*$",
> +	 /* -- */
> +	 /*
> +	  * This regex comes from W3C CSS specs. Should theoretically also
> +	  * allow ISO 10646 characters U+00A0 and higher,
> +	  * but they are not handled in this regex.
> +	  */
> +	 "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */

Drop A-F.

> +	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */

Here, too: it is an IPATTERN.

-- Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-03  5:52     ` Johannes Sixt
@ 2016-06-03  6:41       ` Matthieu Moy
  2016-06-03  6:56         ` Johannes Sixt
  2016-06-03  9:45       ` William Duclot
  1 sibling, 1 reply; 21+ messages in thread
From: Matthieu Moy @ 2016-06-03  6:41 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: William Duclot, git, simon.rabourg, antoine.queru,
	francois.beutin, gitster

Johannes Sixt <j6t@kdbg.org> writes:

>> +IPATTERN("css",
>> +	 "!^.*;\n"
>
> Is there a difference between this and "!;\n"? Is it necessary to
> anchor the pattern at the beginning of the line?

Also, you don't want to force the end of line right after ;. I know
trailing whitespaces are evil, but users may not deserve to be punished
so hard when they have some ;-).

> In the commit message you talk about colon (':'), but you actually use
> a semicolon (';'). Thinking a bit more about it, rejecting lines with
> either one would be even better. Consider this case (without the
> indentation):

Rejecting colon anywhere in the line would also reject valid patterns
like this:

a:hover {

Rejecting it at end of line is probably a good trade-off.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-03  6:41       ` Matthieu Moy
@ 2016-06-03  6:56         ` Johannes Sixt
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Sixt @ 2016-06-03  6:56 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: William Duclot, git, simon.rabourg, antoine.queru,
	francois.beutin, gitster

Am 03.06.2016 um 08:41 schrieb Matthieu Moy:
> Rejecting colon anywhere in the line would also reject valid patterns
> like this:
>
> a:hover {
>
> Rejecting it at end of line is probably a good trade-off.

Good point. Could be worth another test case.

-- Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-03  5:52     ` Johannes Sixt
  2016-06-03  6:41       ` Matthieu Moy
@ 2016-06-03  9:45       ` William Duclot
  2016-06-03 15:50         ` Junio C Hamano
  1 sibling, 1 reply; 21+ messages in thread
From: William Duclot @ 2016-06-03  9:45 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: git, simon.rabourg, antoine.queru, francois.beutin, gitster,
	Matthieu Moy

On Fri, Jun 03, 2016 at 07:52:45AM +0200, Johannes Sixt wrote:
> Am 03.06.2016 um 00:48 schrieb William Duclot:
>>Logic behind the "pattern" regex is:
> 
> The name of the macro parameter is "pattern", but the actual meaning
> is "function name" regex.
> 
>>diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
>>index e3b1de8..81f60ad 100644
>>--- a/Documentation/gitattributes.txt
>>+++ b/Documentation/gitattributes.txt
>>@@ -525,6 +525,8 @@ patterns are available:
>>
>>  - `csharp` suitable for source code in the C# language.
>>
>>+- `css` suitable for source code in the CSS language.
> 
> CSS is not so much source code. How about "suitable for cascaded
> style sheets"?

Technically correct yes
 
>>diff --git a/t/t4018/css-common b/t/t4018/css-common
>>new file mode 100644
>>index 0000000..84ed754
>>--- /dev/null
>>+++ b/t/t4018/css-common
>>@@ -0,0 +1,4 @@
>>+RIGHT label.control-label {
>>+    margin-top: 10px!important;
>>+    border : 10px ChangeMe #C6C6C6;
>>+}
> 
>>diff --git a/t/t4018/css-rule b/t/t4018/css-rule
>>new file mode 100644
>>index 0000000..84ed754
>>--- /dev/null
>>+++ b/t/t4018/css-rule
>>@@ -0,0 +1,4 @@
>>+RIGHT label.control-label {
>>+    margin-top: 10px!important;
>>+    border : 10px ChangeMe #C6C6C6;
>>+}
> 
> These two are the same. Please pick only one. I propose the name
> "common" because it is how CSS rules are written most commonly.

Right, I was distracted

>>+IPATTERN("css",
>>+	 "!^.*;\n"
> 
> Is there a difference between this and "!;\n"? Is it necessary to
> anchor the pattern at the beginning of the line?
> 
> In the commit message you talk about colon (':'), but you actually
> use a semicolon (';'). Thinking a bit more about it, rejecting lines
> with either one would be even better. Consider this case (without
> the indentation):
> 
>    h1 {
>    color:
>    red;
>    }
> 
> (New test case, hint, hint!) Therefore, it could be: "![:;]\n".

You're right! (plus the potential trailing spaces)

>>+	 "^[_a-z0-9].*$",
>>+	 /* -- */
>>+	 /*
>>+	  * This regex comes from W3C CSS specs. Should theoretically also
>>+	  * allow ISO 10646 characters U+00A0 and higher,
>>+	  * but they are not handled in this regex.
>>+	  */
>>+	 "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
> 
> Drop A-F.
> 
>>+	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
> 
> Here, too: it is an IPATTERN.

Here I have to disagree (with you and Junio): the IPATTERN is
case-insensitive only on the "pattern" regex, not the "word_regex"
regex. It can be seen in the macro definition, and a quick test confirm
that (and we can see that the fortran word_regex, for example, bother
with uppercase and lowercase even thought it use IPATTERN).
On the identifier line, I have "A-F" instead of "A-Z" though

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] userdiff: add built-in pattern for CSS
  2016-06-02 22:48   ` William Duclot
  2016-06-02 23:07     ` Junio C Hamano
  2016-06-03  5:52     ` Johannes Sixt
@ 2016-06-03 12:32     ` William Duclot
  2016-06-03 21:31       ` Johannes Sixt
  2 siblings, 1 reply; 21+ messages in thread
From: William Duclot @ 2016-06-03 12:32 UTC (permalink / raw)
  To: git
  Cc: simon.rabourg, antoine.queru, francois.beutin, j6t, gitster,
	Matthieu Moy

CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Logic behind the "pattern" regex is:
    1. reject lines ending with a colon/semicolon (properties)
    2. if a line begins with a name in column 1, pick the whole line

Credits to Johannes Sixt (j6t@kdbg.org) for the pattern regex and most
of the tests.

Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
---
Changes since V3:
    - Add a few tests
    - Remove a redondant test
    - Handle trailing spaces
    - Reword in doc
    - Improvement of the pattern regex

 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh        |  1 +
 t/t4018/css-brace-in-col-1      |  5 +++++
 t/t4018/css-colon-eol           |  4 ++++
 t/t4018/css-colon-selector      |  5 +++++
 t/t4018/css-common              |  4 ++++
 t/t4018/css-long-selector-list  |  6 ++++++
 t/t4018/css-prop-sans-indent    |  5 +++++
 t/t4018/css-short-selector-list |  4 ++++
 t/t4018/css-trailing-space      |  5 +++++
 t/t4034-diff-words.sh           |  1 +
 t/t4034/css/expect              | 16 ++++++++++++++++
 t/t4034/css/post                | 10 ++++++++++
 t/t4034/css/pre                 | 10 ++++++++++
 userdiff.c                      | 12 ++++++++++++
 15 files changed, 90 insertions(+)
 create mode 100644 t/t4018/css-brace-in-col-1
 create mode 100644 t/t4018/css-colon-eol
 create mode 100644 t/t4018/css-colon-selector
 create mode 100644 t/t4018/css-common
 create mode 100644 t/t4018/css-long-selector-list
 create mode 100644 t/t4018/css-prop-sans-indent
 create mode 100644 t/t4018/css-short-selector-list
 create mode 100644 t/t4018/css-trailing-space
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..8882a3e 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for cascading style sheets.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
 	bibtex
 	cpp
 	csharp
+	css
 	fortran
 	fountain
 	html
diff --git a/t/t4018/css-brace-in-col-1 b/t/t4018/css-brace-in-col-1
new file mode 100644
index 0000000..7831577
--- /dev/null
+++ b/t/t4018/css-brace-in-col-1
@@ -0,0 +1,5 @@
+RIGHT label.control-label
+{
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-colon-eol b/t/t4018/css-colon-eol
new file mode 100644
index 0000000..5a30553
--- /dev/null
+++ b/t/t4018/css-colon-eol
@@ -0,0 +1,4 @@
+RIGHT h1 {
+color:
+ChangeMe;
+}
diff --git a/t/t4018/css-colon-selector b/t/t4018/css-colon-selector
new file mode 100644
index 0000000..c6d71fb
--- /dev/null
+++ b/t/t4018/css-colon-selector
@@ -0,0 +1,5 @@
+RIGHT a:hover {
+    margin-top:
+    10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-common b/t/t4018/css-common
new file mode 100644
index 0000000..84ed754
--- /dev/null
+++ b/t/t4018/css-common
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-long-selector-list b/t/t4018/css-long-selector-list
new file mode 100644
index 0000000..7ccd25d
--- /dev/null
+++ b/t/t4018/css-long-selector-list
@@ -0,0 +1,6 @@
+p.header,
+label.control-label,
+div ul#RIGHT {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-prop-sans-indent b/t/t4018/css-prop-sans-indent
new file mode 100644
index 0000000..a9e3c86
--- /dev/null
+++ b/t/t4018/css-prop-sans-indent
@@ -0,0 +1,5 @@
+RIGHT, label.control-label {
+margin-top: 10px!important;
+padding: 0;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-short-selector-list b/t/t4018/css-short-selector-list
new file mode 100644
index 0000000..6a0bdee
--- /dev/null
+++ b/t/t4018/css-short-selector-list
@@ -0,0 +1,4 @@
+label.control, div ul#RIGHT {
+    margin-top: 10px!important;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-trailing-space b/t/t4018/css-trailing-space
new file mode 100644
index 0000000..32b5606
--- /dev/null
+++ b/t/t4018/css-trailing-space
@@ -0,0 +1,5 @@
+RIGHT label.control-label {
+    margin:10px;   
+    padding:10px;
+    border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect b/t/t4034/css/expect
new file mode 100644
index 0000000..ed10393
--- /dev/null
+++ b/t/t4034/css/expect
@@ -0,0 +1,16 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index b8ae0bb..fe500b7 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,10 +1,10 @@<RESET>
+.<RED>class-form<RESET><GREEN>other-form<RESET> label.control-label {
+    margin-top: <RED>10<RESET><GREEN>15<RESET>px!important;
+    border : 10px <RED>dashed<RESET><GREEN>dotted<RESET> #C6C6C6;
+}<RESET>
+<RED>#CCCCCC<RESET><GREEN>#CCCCCB<RESET>
+10em<RESET>
+<RED>padding-bottom<RESET><GREEN>margin-left<RESET>
+150<RED>px<RESET><GREEN>em<RESET>
+10px
+<RED>!important<RESET>
+<RED>div<RESET><GREEN>li<RESET>.class#id
diff --git a/t/t4034/css/post b/t/t4034/css/post
new file mode 100644
index 0000000..fe500b7
--- /dev/null
+++ b/t/t4034/css/post
@@ -0,0 +1,10 @@
+.other-form label.control-label {
+    margin-top: 15px!important;
+    border : 10px dotted #C6C6C6;
+}
+#CCCCCB
+10em
+margin-left
+150em
+10px
+li.class#id
diff --git a/t/t4034/css/pre b/t/t4034/css/pre
new file mode 100644
index 0000000..b8ae0bb
--- /dev/null
+++ b/t/t4034/css/pre
@@ -0,0 +1,10 @@
+.class-form label.control-label {
+    margin-top: 10px!important;
+    border : 10px dashed #C6C6C6;
+}
+#CCCCCC
+10em
+padding-bottom
+150px
+10px!important
+div.class#id
diff --git a/userdiff.c b/userdiff.c
index 6bf2505..2125d6d 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -148,6 +148,18 @@ PATTERNS("csharp",
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+IPATTERN("css",
+	 "![:;][[:space:]]*$\n"
+	 "^[_a-z0-9].*$",
+	 /* -- */
+	 /*
+	  * This regex comes from W3C CSS specs. Should theoretically also
+	  * allow ISO 10646 characters U+00A0 and higher,
+	  * but they are not handled in this regex.
+	  */
+	 "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
+	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
+),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
-- 
2.9.0.rc1.1.geac644e

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-03  9:45       ` William Duclot
@ 2016-06-03 15:50         ` Junio C Hamano
  2016-06-06  7:28           ` William Duclot
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2016-06-03 15:50 UTC (permalink / raw)
  To: William Duclot
  Cc: Johannes Sixt, git, simon.rabourg, antoine.queru, francois.beutin,
	Matthieu Moy

William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:

> Here I have to disagree (with you and Junio): the IPATTERN is
> case-insensitive only on the "pattern" regex, not the "word_regex"
> regex.

Ahh, OK.  Obviously both of us overlooked that.  Thanks for pushing
back.

> On the identifier line, I have "A-F" instead of "A-Z" though

Yeah, that does need updating.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-03 12:32     ` William Duclot
@ 2016-06-03 21:31       ` Johannes Sixt
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Sixt @ 2016-06-03 21:31 UTC (permalink / raw)
  To: William Duclot
  Cc: git, simon.rabourg, antoine.queru, francois.beutin, gitster,
	Matthieu Moy

Am 03.06.2016 um 14:32 schrieb William Duclot:
> CSS is widely used, motivating it being included as a built-in pattern.
>
> It must be noted that the word_regex for CSS (i.e. the regex defining
> what is a word in the language) does not consider '.' and '#' characters
> (in CSS selectors) to be part of the word. This behavior is documented
> by the test t/t4018/css-rule.
> The logic behind this behavior is the following: identifiers in CSS
> selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
> character are not part of the identifier, but an indicator of the nature
> of the identifier in HTML/XML (class or id). Diffing ".class1" and
> ".class2" must show that the class name is changed, but we still are
> selecting a class.
>
> Logic behind the "pattern" regex is:
>      1. reject lines ending with a colon/semicolon (properties)
>      2. if a line begins with a name in column 1, pick the whole line
>
> Credits to Johannes Sixt (j6t@kdbg.org) for the pattern regex and most
> of the tests.
>
> Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
> ---
> Changes since V3:
>      - Add a few tests
>      - Remove a redondant test
>      - Handle trailing spaces
>      - Reword in doc
>      - Improvement of the pattern regex

Thanks, I think we can take this version.

-- Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-03 15:50         ` Junio C Hamano
@ 2016-06-06  7:28           ` William Duclot
  2016-06-06 18:00             ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: William Duclot @ 2016-06-06  7:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, git, simon.rabourg, antoine.queru, francois.beutin,
	Matthieu Moy

On Fri, Jun 03, 2016 at 08:50:50AM -0700, Junio C Hamano wrote:
> William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:
> 
> > Here I have to disagree (with you and Junio): the IPATTERN is
> > case-insensitive only on the "pattern" regex, not the "word_regex"
> > regex.
> 
> Ahh, OK.  Obviously both of us overlooked that.  Thanks for pushing
> back.
> 
> > On the identifier line, I have "A-F" instead of "A-Z" though
> 
> Yeah, that does need updating.

Note that I sent a V4 :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-06  7:28           ` William Duclot
@ 2016-06-06 18:00             ` Junio C Hamano
  2016-06-06 20:45               ` William Duclot
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2016-06-06 18:00 UTC (permalink / raw)
  To: William Duclot
  Cc: Johannes Sixt, git, simon.rabourg, antoine.queru, francois.beutin,
	Matthieu Moy

William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:

> On Fri, Jun 03, 2016 at 08:50:50AM -0700, Junio C Hamano wrote:
>> William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:
>> 
>> > Here I have to disagree (with you and Junio): the IPATTERN is
>> > case-insensitive only on the "pattern" regex, not the "word_regex"
>> > regex.
>> 
>> Ahh, OK.  Obviously both of us overlooked that.  Thanks for pushing
>> back.
>> 
>> > On the identifier line, I have "A-F" instead of "A-Z" though
>> 
>> Yeah, that does need updating.
>
> Note that I sent a V4 :)

Yup, thanks.  Isn't that what I queued as 0719f3ee (userdiff: add
built-in pattern for CSS, 2016-06-03)?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-06 18:00             ` Junio C Hamano
@ 2016-06-06 20:45               ` William Duclot
  2016-06-06 20:55                 ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: William Duclot @ 2016-06-06 20:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, git, simon.rabourg, antoine.queru, francois.beutin,
	Matthieu Moy

On Mon, Jun 06, 2016 at 11:00:38AM -0700, Junio C Hamano wrote:
> William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:
> 
> > On Fri, Jun 03, 2016 at 08:50:50AM -0700, Junio C Hamano wrote:
> >> William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:
> >> 
> >> > Here I have to disagree (with you and Junio): the IPATTERN is
> >> > case-insensitive only on the "pattern" regex, not the "word_regex"
> >> > regex.
> >> 
> >> Ahh, OK.  Obviously both of us overlooked that.  Thanks for pushing
> >> back.
> >> 
> >> > On the identifier line, I have "A-F" instead of "A-Z" though
> >> 
> >> Yeah, that does need updating.
> >
> > Note that I sent a V4 :)
> 
> Yup, thanks.  Isn't that what I queued as 0719f3ee (userdiff: add
> built-in pattern for CSS, 2016-06-03)?

It is, my bad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] userdiff: add built-in pattern for CSS
  2016-06-06 20:45               ` William Duclot
@ 2016-06-06 20:55                 ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2016-06-06 20:55 UTC (permalink / raw)
  To: William Duclot
  Cc: Johannes Sixt, git, simon.rabourg, antoine.queru, francois.beutin,
	Matthieu Moy

William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:

>> Yup, thanks.  Isn't that what I queued as 0719f3ee (userdiff: add
>> built-in pattern for CSS, 2016-06-03)?
>
> It is, my bad

Not your bad at all.  I am leaky and was asking you to double check;
it was entirely possible that I missed your even newer patch and
what was queued was the second from the last one.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-06-06 20:55 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20 13:28 [PATCH] userdiff: add built-in pattern for CSS William Duclot
2016-05-20 22:37 ` Junio C Hamano
2016-05-24 14:25 ` William Duclot
2016-05-24 19:06   ` Junio C Hamano
2016-05-24 22:12     ` William Duclot
2016-05-24 22:18       ` Junio C Hamano
2016-05-26 21:11   ` Johannes Sixt
2016-05-27  7:48     ` William Duclot
2016-06-02 22:48   ` William Duclot
2016-06-02 23:07     ` Junio C Hamano
2016-06-03  5:52     ` Johannes Sixt
2016-06-03  6:41       ` Matthieu Moy
2016-06-03  6:56         ` Johannes Sixt
2016-06-03  9:45       ` William Duclot
2016-06-03 15:50         ` Junio C Hamano
2016-06-06  7:28           ` William Duclot
2016-06-06 18:00             ` Junio C Hamano
2016-06-06 20:45               ` William Duclot
2016-06-06 20:55                 ` Junio C Hamano
2016-06-03 12:32     ` William Duclot
2016-06-03 21:31       ` Johannes Sixt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).