git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] userdiff: support Markdown
@ 2020-04-21  1:00 Ash Holland
  2020-04-21  2:22 ` Emma Brooks
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Ash Holland @ 2020-04-21  1:00 UTC (permalink / raw)
  To: git; +Cc: Ash Holland, Junio C Hamano, Boxuan Li, Alban Gruin

It's typical to find Markdown documentation alongside source code, and
having better context for documentation changes is useful; see also
commit 69f9c87d4 (userdiff: add support for Fountain documents,
2015-07-21).

The pattern is based on the CommonMark specification 0.29, section 4.2:
https://spec.commonmark.org/

Only ATX headings are supported, as detecting setext headings would
require printing the line before a pattern matches, or matching a
multiline pattern.

Signed-off-by: Ash Holland <ash@sorrel.sh>
---

If it is indeed possible to match multiline patterns, let me know! I
would love to support setext (underlined) headings with this.

I would also appreciate feedback on the word-diff pattern here, I have
no real idea what should constitute a word in a Markdown document, apart
from that it should probably be similar to the definition given for
Fountain, given that Fountain appears to have somewhat similar inline
syntax to Markdown.

 Documentation/gitattributes.txt       |  2 ++
 t/t4018-diff-funcname.sh              |  1 +
 t/t4018/markdown-heading-indented     |  6 ++++++
 t/t4018/markdown-heading-non-headings | 17 +++++++++++++++++
 userdiff.c                            |  3 +++
 5 files changed, 29 insertions(+)
 create mode 100644 t/t4018/markdown-heading-indented
 create mode 100644 t/t4018/markdown-heading-non-headings

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 508fe713c..2d0a03715 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -824,6 +824,8 @@ patterns are available:
 
 - `java` suitable for source code in the Java language.
 
+- `markdown` suitable for Markdown documents.
+
 - `matlab` suitable for source code in the MATLAB and Octave languages.
 
 - `objc` suitable for source code in the Objective-C language.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 02255a08b..9d0779757 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -38,6 +38,7 @@ diffpatterns="
 	golang
 	html
 	java
+	markdown
 	matlab
 	objc
 	pascal
diff --git a/t/t4018/markdown-heading-indented b/t/t4018/markdown-heading-indented
new file mode 100644
index 000000000..1991c2bd4
--- /dev/null
+++ b/t/t4018/markdown-heading-indented
@@ -0,0 +1,6 @@
+Indented headings are allowed, as long as the indent is no more than 3 spaces.
+
+   ### RIGHT
+
+- something
+- ChangeMe
diff --git a/t/t4018/markdown-heading-non-headings b/t/t4018/markdown-heading-non-headings
new file mode 100644
index 000000000..1f19b91d6
--- /dev/null
+++ b/t/t4018/markdown-heading-non-headings
@@ -0,0 +1,17 @@
+Headings can be right next to other lines of the file:
+# RIGHT
+Indents of more than four spaces make a code block:
+
+    # code comment, not heading
+
+If there's no space after the final hash, it's not a heading:
+
+#hashtag
+
+Sequences of more than 6 hashes don't make a heading:
+
+####### over-enthusiastic heading
+
+So the detected heading should be right up at the start of this file.
+
+ChangeMe
diff --git a/userdiff.c b/userdiff.c
index efbe05e5a..f79adb3a3 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -79,6 +79,9 @@ PATTERNS("java",
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
+PATTERNS("markdown",
+	 "^ {0,3}#{1,6}( .*)?$",
+	 "[^ \t-]+"),
 PATTERNS("matlab",
 	 /*
 	  * Octave pattern is mostly the same as matlab, except that '%%%' and
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] userdiff: support Markdown
  2020-04-21  1:00 [PATCH] userdiff: support Markdown Ash Holland
@ 2020-04-21  2:22 ` Emma Brooks
  2020-04-23 23:32   ` Ash Holland
  2020-04-23 18:17 ` Johannes Sixt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Emma Brooks @ 2020-04-21  2:22 UTC (permalink / raw)
  To: Ash Holland; +Cc: git, Junio C Hamano, Boxuan Li, Alban Gruin

On 2020-04-21 02:00:35+0100, Ash Holland wrote:
> I would also appreciate feedback on the word-diff pattern here, I have
> no real idea what should constitute a word in a Markdown document, apart
> from that it should probably be similar to the definition given for
> Fountain, given that Fountain appears to have somewhat similar inline
> syntax to Markdown.

Since Markdown can have raw HTML tags in many variants, it may make
sense to extend the word pattern to "[^<>= \t]+" like HTML's pattern so
tags starting/ending will not be considered part of a word.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] userdiff: support Markdown
  2020-04-21  1:00 [PATCH] userdiff: support Markdown Ash Holland
  2020-04-21  2:22 ` Emma Brooks
@ 2020-04-23 18:17 ` Johannes Sixt
  2020-04-23 23:42   ` Ash Holland
  2020-04-29 23:05 ` [PATCH v2] " Ash Holland
  2020-05-02 13:15 ` [PATCH v3] " Ash Holland
  3 siblings, 1 reply; 15+ messages in thread
From: Johannes Sixt @ 2020-04-23 18:17 UTC (permalink / raw)
  To: Ash Holland; +Cc: git, Junio C Hamano, Boxuan Li, Alban Gruin

Am 21.04.20 um 03:00 schrieb Ash Holland:
> It's typical to find Markdown documentation alongside source code, and
> having better context for documentation changes is useful; see also
> commit 69f9c87d4 (userdiff: add support for Fountain documents,
> 2015-07-21).
> 
> The pattern is based on the CommonMark specification 0.29, section 4.2:
> https://spec.commonmark.org/
> 
> Only ATX headings are supported, as detecting setext headings would
> require printing the line before a pattern matches, or matching a
> multiline pattern.

The patch looks good. I have one question about the patthern, though
(see below).

> Signed-off-by: Ash Holland <ash@sorrel.sh>
> ---
> 
> If it is indeed possible to match multiline patterns, let me know! I
> would love to support setext (underlined) headings with this.

We don't have multi-line matching, unfortunately.

> I would also appreciate feedback on the word-diff pattern here, I have
> no real idea what should constitute a word in a Markdown document, apart
> from that it should probably be similar to the definition given for
> Fountain, given that Fountain appears to have somewhat similar inline
> syntax to Markdown.
> 
>  Documentation/gitattributes.txt       |  2 ++
>  t/t4018-diff-funcname.sh              |  1 +
>  t/t4018/markdown-heading-indented     |  6 ++++++
>  t/t4018/markdown-heading-non-headings | 17 +++++++++++++++++
>  userdiff.c                            |  3 +++
>  5 files changed, 29 insertions(+)
>  create mode 100644 t/t4018/markdown-heading-indented
>  create mode 100644 t/t4018/markdown-heading-non-headings
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 508fe713c..2d0a03715 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -824,6 +824,8 @@ patterns are available:
>  
>  - `java` suitable for source code in the Java language.
>  
> +- `markdown` suitable for Markdown documents.
> +
>  - `matlab` suitable for source code in the MATLAB and Octave languages.
>  
>  - `objc` suitable for source code in the Objective-C language.
> diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
> index 02255a08b..9d0779757 100755
> --- a/t/t4018-diff-funcname.sh
> +++ b/t/t4018-diff-funcname.sh
> @@ -38,6 +38,7 @@ diffpatterns="
>  	golang
>  	html
>  	java
> +	markdown
>  	matlab
>  	objc
>  	pascal
> diff --git a/t/t4018/markdown-heading-indented b/t/t4018/markdown-heading-indented
> new file mode 100644
> index 000000000..1991c2bd4
> --- /dev/null
> +++ b/t/t4018/markdown-heading-indented
> @@ -0,0 +1,6 @@
> +Indented headings are allowed, as long as the indent is no more than 3 spaces.
> +
> +   ### RIGHT
> +
> +- something
> +- ChangeMe
> diff --git a/t/t4018/markdown-heading-non-headings b/t/t4018/markdown-heading-non-headings
> new file mode 100644
> index 000000000..1f19b91d6
> --- /dev/null
> +++ b/t/t4018/markdown-heading-non-headings
> @@ -0,0 +1,17 @@
> +Headings can be right next to other lines of the file:
> +# RIGHT
> +Indents of more than four spaces make a code block:
> +
> +    # code comment, not heading
> +
> +If there's no space after the final hash, it's not a heading:
> +
> +#hashtag
> +
> +Sequences of more than 6 hashes don't make a heading:
> +
> +####### over-enthusiastic heading
> +
> +So the detected heading should be right up at the start of this file.
> +
> +ChangeMe

Nicely done!

> diff --git a/userdiff.c b/userdiff.c
> index efbe05e5a..f79adb3a3 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -79,6 +79,9 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +PATTERNS("markdown",
> +	 "^ {0,3}#{1,6}( .*)?$",

What is the purpose of making the heading text optional? Why would you
want to match a sequence of hash marks without any text following it?

> +	 "[^ \t-]+"),
>  PATTERNS("matlab",
>  	 /*
>  	  * Octave pattern is mostly the same as matlab, except that '%%%' and
> 

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] userdiff: support Markdown
  2020-04-21  2:22 ` Emma Brooks
@ 2020-04-23 23:32   ` Ash Holland
  2020-04-28 21:57     ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Ash Holland @ 2020-04-23 23:32 UTC (permalink / raw)
  To: Emma Brooks; +Cc: git, Junio C Hamano, Boxuan Li, Alban Gruin

On Tue Apr 21, 2020 at 2:22 AM, Emma Brooks wrote:
> Since Markdown can have raw HTML tags in many variants, it may make
> sense to extend the word pattern to "[^<>= \t]+" like HTML's pattern so
> tags starting/ending will not be considered part of a word.

Good point, I'll update the pattern to that, thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] userdiff: support Markdown
  2020-04-23 18:17 ` Johannes Sixt
@ 2020-04-23 23:42   ` Ash Holland
  2020-04-24 17:21     ` Johannes Sixt
  0 siblings, 1 reply; 15+ messages in thread
From: Ash Holland @ 2020-04-23 23:42 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, Boxuan Li, Alban Gruin

On Thu Apr 23, 2020 at 9:17 PM PST, Johannes Sixt wrote:
> Am 21.04.20 um 03:00 schrieb Ash Holland:
> > diff --git a/userdiff.c b/userdiff.c
> > index efbe05e5a..f79adb3a3 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -79,6 +79,9 @@ PATTERNS("java",
> >  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
> >  	 "|[-+*/<>%&^|=!]="
> >  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> > +PATTERNS("markdown",
> > +	 "^ {0,3}#{1,6}( .*)?$",
>
> What is the purpose of making the heading text optional? Why would you
> want to match a sequence of hash marks without any text following it?

Strictly speaking, a markdown heading is allowed to be empty -- see for
example https://spec.commonmark.org/0.29/#example-49. I'm happy to
change it if you think it's more useful to show a previous heading which
contains text than an empty one, though.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] userdiff: support Markdown
  2020-04-23 23:42   ` Ash Holland
@ 2020-04-24 17:21     ` Johannes Sixt
  2020-04-29 12:21       ` Ash Holland
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Sixt @ 2020-04-24 17:21 UTC (permalink / raw)
  To: Ash Holland; +Cc: git, Junio C Hamano, Boxuan Li, Alban Gruin

Am 24.04.20 um 01:42 schrieb Ash Holland:
> On Thu Apr 23, 2020 at 9:17 PM PST, Johannes Sixt wrote:
>> Am 21.04.20 um 03:00 schrieb Ash Holland:
>>> diff --git a/userdiff.c b/userdiff.c
>>> index efbe05e5a..f79adb3a3 100644
>>> --- a/userdiff.c
>>> +++ b/userdiff.c
>>> @@ -79,6 +79,9 @@ PATTERNS("java",
>>>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>>>  	 "|[-+*/<>%&^|=!]="
>>>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
>>> +PATTERNS("markdown",
>>> +	 "^ {0,3}#{1,6}( .*)?$",
>>
>> What is the purpose of making the heading text optional? Why would you
>> want to match a sequence of hash marks without any text following it?
> 
> Strictly speaking, a markdown heading is allowed to be empty -- see for
> example https://spec.commonmark.org/0.29/#example-49. I'm happy to
> change it if you think it's more useful to show a previous heading which
> contains text than an empty one, though.

I don't know what makes sense, I don't write markdown regularly. A quick
check shows that the sequence of hashmarks appears in the hunk header.
Is that useful? (A genuine question!)

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] userdiff: support Markdown
  2020-04-23 23:32   ` Ash Holland
@ 2020-04-28 21:57     ` Junio C Hamano
  2020-04-29 12:12       ` Ash Holland
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2020-04-28 21:57 UTC (permalink / raw)
  To: Ash Holland; +Cc: Emma Brooks, git, Boxuan Li, Alban Gruin

"Ash Holland" <ash@sorrel.sh> writes:

> On Tue Apr 21, 2020 at 2:22 AM, Emma Brooks wrote:
>> Since Markdown can have raw HTML tags in many variants, it may make
>> sense to extend the word pattern to "[^<>= \t]+" like HTML's pattern so
>> tags starting/ending will not be considered part of a word.
>
> Good point, I'll update the pattern to that, thanks!

I just marked the topic as "expecting a reroll" in the "What's
cooking" report I have been preparing, but has something happened
after this exchange?

No need to rush, but we'll be closing the acceptance of new features
in three weeks for this cycle, so we won't have infinite amount of
time, either.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] userdiff: support Markdown
  2020-04-28 21:57     ` Junio C Hamano
@ 2020-04-29 12:12       ` Ash Holland
  0 siblings, 0 replies; 15+ messages in thread
From: Ash Holland @ 2020-04-29 12:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Emma Brooks, git, Boxuan Li, Alban Gruin

On Tue Apr 28, 2020 at 3:57 PM BST, Junio C Hamano wrote:
> "Ash Holland" <ash@sorrel.sh> writes:
>
> > On Tue Apr 21, 2020 at 2:22 AM, Emma Brooks wrote:
> >> Since Markdown can have raw HTML tags in many variants, it may make
> >> sense to extend the word pattern to "[^<>= \t]+" like HTML's pattern so
> >> tags starting/ending will not be considered part of a word.
> >
> > Good point, I'll update the pattern to that, thanks!
>
> I just marked the topic as "expecting a reroll" in the "What's
> cooking" report I have been preparing, but has something happened
> after this exchange?

You've not missed anything, sorry, I've been busy with exams -- I'll
prepare a v2 later today.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] userdiff: support Markdown
  2020-04-24 17:21     ` Johannes Sixt
@ 2020-04-29 12:21       ` Ash Holland
  0 siblings, 0 replies; 15+ messages in thread
From: Ash Holland @ 2020-04-29 12:21 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, Boxuan Li, Alban Gruin

On Fri Apr 24, 2020 at 8:21 PM BST, Johannes Sixt wrote:
> Am 24.04.20 um 01:42 schrieb Ash Holland:
> > On Thu Apr 23, 2020 at 9:17 PM PST, Johannes Sixt wrote:
> >> Am 21.04.20 um 03:00 schrieb Ash Holland:
> >>> diff --git a/userdiff.c b/userdiff.c
> >>> index efbe05e5a..f79adb3a3 100644
> >>> --- a/userdiff.c
> >>> +++ b/userdiff.c
> >>> @@ -79,6 +79,9 @@ PATTERNS("java",
> >>>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
> >>>  	 "|[-+*/<>%&^|=!]="
> >>>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> >>> +PATTERNS("markdown",
> >>> +	 "^ {0,3}#{1,6}( .*)?$",
> >>
> >> What is the purpose of making the heading text optional? Why would you
> >> want to match a sequence of hash marks without any text following it?
> > 
> > Strictly speaking, a markdown heading is allowed to be empty -- see for
> > example https://spec.commonmark.org/0.29/#example-49. I'm happy to
> > change it if you think it's more useful to show a previous heading which
> > contains text than an empty one, though.
>
> I don't know what makes sense, I don't write markdown regularly. A quick
> check shows that the sequence of hashmarks appears in the hunk header.
> Is that useful? (A genuine question!)

I think probably it would be more confusing to have Git silently ignore
empty headings, having occasionally written documents with empty
headings in the past (e.g. when I know I want some different sections,
but I don't know what to call them yet). Probably not many people would
ever run into this situation either way, though.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] userdiff: support Markdown
  2020-04-21  1:00 [PATCH] userdiff: support Markdown Ash Holland
  2020-04-21  2:22 ` Emma Brooks
  2020-04-23 18:17 ` Johannes Sixt
@ 2020-04-29 23:05 ` Ash Holland
  2020-04-30 17:31   ` Junio C Hamano
  2020-05-02 13:15 ` [PATCH v3] " Ash Holland
  3 siblings, 1 reply; 15+ messages in thread
From: Ash Holland @ 2020-04-29 23:05 UTC (permalink / raw)
  To: git; +Cc: Ash Holland, Junio C Hamano, Boxuan Li, Alban Gruin

It's typical to find Markdown documentation alongside source code, and
having better context for documentation changes is useful; see also
commit 69f9c87d4 (userdiff: add support for Fountain documents,
2015-07-21).

The pattern is based on the CommonMark specification 0.29, section 4.2:
https://spec.commonmark.org/

Only ATX headings are supported, as detecting setext headings would
require printing the line before a pattern matches, or matching a
multiline pattern. The word-diff pattern is the same as the pattern for
HTML, because many Markdown parsers accept inline HTML.

Signed-off-by: Ash Holland <ash@sorrel.sh>
---
Changes since the previous patch:
- changed the word-diff pattern to match the HTML pattern
- fixed an off-by-one error in the wording of the test

 Documentation/gitattributes.txt       |  2 ++
 t/t4018-diff-funcname.sh              |  1 +
 t/t4018/markdown-heading-indented     |  6 ++++++
 t/t4018/markdown-heading-non-headings | 17 +++++++++++++++++
 userdiff.c                            |  3 +++
 5 files changed, 29 insertions(+)
 create mode 100644 t/t4018/markdown-heading-indented
 create mode 100644 t/t4018/markdown-heading-non-headings

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 508fe713c..2d0a03715 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -824,6 +824,8 @@ patterns are available:
 
 - `java` suitable for source code in the Java language.
 
+- `markdown` suitable for Markdown documents.
+
 - `matlab` suitable for source code in the MATLAB and Octave languages.
 
 - `objc` suitable for source code in the Objective-C language.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 02255a08b..9d0779757 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -38,6 +38,7 @@ diffpatterns="
 	golang
 	html
 	java
+	markdown
 	matlab
 	objc
 	pascal
diff --git a/t/t4018/markdown-heading-indented b/t/t4018/markdown-heading-indented
new file mode 100644
index 000000000..1991c2bd4
--- /dev/null
+++ b/t/t4018/markdown-heading-indented
@@ -0,0 +1,6 @@
+Indented headings are allowed, as long as the indent is no more than 3 spaces.
+
+   ### RIGHT
+
+- something
+- ChangeMe
diff --git a/t/t4018/markdown-heading-non-headings b/t/t4018/markdown-heading-non-headings
new file mode 100644
index 000000000..c479c1a3f
--- /dev/null
+++ b/t/t4018/markdown-heading-non-headings
@@ -0,0 +1,17 @@
+Headings can be right next to other lines of the file:
+# RIGHT
+Indents of four or more spaces make a code block:
+
+    # code comment, not heading
+
+If there's no space after the final hash, it's not a heading:
+
+#hashtag
+
+Sequences of more than 6 hashes don't make a heading:
+
+####### over-enthusiastic heading
+
+So the detected heading should be right up at the start of this file.
+
+ChangeMe
diff --git a/userdiff.c b/userdiff.c
index efbe05e5a..3eaa1df08 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -79,6 +79,9 @@ PATTERNS("java",
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
+PATTERNS("markdown",
+	 "^ {0,3}#{1,6}( .*)?$",
+	 "[^<>= \t]+"),
 PATTERNS("matlab",
 	 /*
 	  * Octave pattern is mostly the same as matlab, except that '%%%' and
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] userdiff: support Markdown
  2020-04-29 23:05 ` [PATCH v2] " Ash Holland
@ 2020-04-30 17:31   ` Junio C Hamano
  2020-05-01 11:49     ` Ash Holland
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2020-04-30 17:31 UTC (permalink / raw)
  To: Ash Holland; +Cc: git, Boxuan Li, Alban Gruin

Ash Holland <ash@sorrel.sh> writes:

> It's typical to find Markdown documentation alongside source code, and
> having better context for documentation changes is useful; see also
> commit 69f9c87d4 (userdiff: add support for Fountain documents,
> 2015-07-21).
>
> The pattern is based on the CommonMark specification 0.29, section 4.2:
> https://spec.commonmark.org/
>
> Only ATX headings are supported, as detecting setext headings would
> require printing the line before a pattern matches, or matching a
> multiline pattern. The word-diff pattern is the same as the pattern for
> HTML, because many Markdown parsers accept inline HTML.

> +PATTERNS("markdown",
> +	 "^ {0,3}#{1,6}( .*)?$",

This is "possibly just a bit indented run of up to 6 hashes, either
ending the line by itself or if some text follows, there must be a
SP after the hashes".

If I had a line that has a hash, HT and then "Hello, world", would
everybody's markdown implementation reject it as a header, because
the whitespace after the run of hashes is not a SP?

Also, allowing only the hashes might be spec-compliant, but how
useful would it be to see just a sequence of 4 hashes without any
text after "@@ -100,5, +100,6 @@" in the diff output?

Taking all that together, my suspicion is

	"^ {0,3}#{1,6}[ \t]"

i.e. "possibly slightly indented run of 6 hashes, with a whitespace
to catch the headers with real contents and nothing else" might be
more practically useful.  I dunno.

> +	 "[^<>= \t]+"),

This does match the one for HTML.

In any case, let me queue this v2 as-is and see what happens.

Thanks.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] userdiff: support Markdown
  2020-04-30 17:31   ` Junio C Hamano
@ 2020-05-01 11:49     ` Ash Holland
  2020-05-01 14:26       ` Johannes Sixt
  0 siblings, 1 reply; 15+ messages in thread
From: Ash Holland @ 2020-05-01 11:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Boxuan Li, Alban Gruin

On Thu Apr 30, 2020 at 11:31 AM BST, Junio C Hamano wrote:
> This is "possibly just a bit indented run of up to 6 hashes, either
> ending the line by itself or if some text follows, there must be a
> SP after the hashes".
>
> If I had a line that has a hash, HT and then "Hello, world", would
> everybody's markdown implementation reject it as a header, because
> the whitespace after the run of hashes is not a SP?

Good point, Pandoc at least accepts a tab as well as a space there.
(Some markdown implementations don't even require the whitespace, so any
line starting with #{1,6} would be a heading, but that runs into
problems with paragraphs containing a line starting with "#", which is
not uncommon.)

> Also, allowing only the hashes might be spec-compliant, but how
> useful would it be to see just a sequence of 4 hashes without any
> text after "@@ -100,5, +100,6 @@" in the diff output?
>
> Taking all that together, my suspicion is
>
> "^ {0,3}#{1,6}[ \t]"
>
> i.e. "possibly slightly indented run of 6 hashes, with a whitespace
> to catch the headers with real contents and nothing else" might be
> more practically useful. I dunno.

Sure, that looks plausible. I don't have a strong opinion on whether
it's more useful to be consistent (and show the last heading, even if it
doesn't contain any text) or to try as hard as possible to just show
"some text", even if it's not the last heading, but two people have now
suggested changing it, so I'll submit a v3 with your suggested pattern.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] userdiff: support Markdown
  2020-05-01 11:49     ` Ash Holland
@ 2020-05-01 14:26       ` Johannes Sixt
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Sixt @ 2020-05-01 14:26 UTC (permalink / raw)
  To: Ash Holland; +Cc: Junio C Hamano, git, Boxuan Li, Alban Gruin

Am 01.05.20 um 13:49 schrieb Ash Holland:
> On Thu Apr 30, 2020 at 11:31 AM BST, Junio C Hamano wrote:
>> Taking all that together, my suspicion is
>>
>> "^ {0,3}#{1,6}[ \t]"
>>
>> i.e. "possibly slightly indented run of 6 hashes, with a whitespace
>> to catch the headers with real contents and nothing else" might be
>> more practically useful. I dunno.
> 
> Sure, that looks plausible. I don't have a strong opinion on whether
> it's more useful to be consistent (and show the last heading, even if it
> doesn't contain any text) or to try as hard as possible to just show
> "some text", even if it's not the last heading, but two people have now
> suggested changing it, so I'll submit a v3 with your suggested pattern.

The pattern above captures only the hashmarks, but not the text of the
header. I suggest to append ".*".

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] userdiff: support Markdown
  2020-04-21  1:00 [PATCH] userdiff: support Markdown Ash Holland
                   ` (2 preceding siblings ...)
  2020-04-29 23:05 ` [PATCH v2] " Ash Holland
@ 2020-05-02 13:15 ` Ash Holland
  2020-05-02 13:58   ` Johannes Sixt
  3 siblings, 1 reply; 15+ messages in thread
From: Ash Holland @ 2020-05-02 13:15 UTC (permalink / raw)
  To: git; +Cc: Ash Holland, Junio C Hamano, Boxuan Li, Alban Gruin

It's typical to find Markdown documentation alongside source code, and
having better context for documentation changes is useful; see also
commit 69f9c87d4 (userdiff: add support for Fountain documents,
2015-07-21).

The pattern is based on the CommonMark specification 0.29, section 4.2
<https://spec.commonmark.org/> but doesn't match empty headings, as
seeing them in a hunk header is unlikely to be useful.

Only ATX headings are supported, as detecting setext headings would
require printing the line before a pattern matches, or matching a
multiline pattern. The word-diff pattern is the same as the pattern for
HTML, because many Markdown parsers accept inline HTML.

Signed-off-by: Ash Holland <ash@sorrel.sh>
---
 Documentation/gitattributes.txt       |  2 ++
 t/t4018-diff-funcname.sh              |  1 +
 t/t4018/markdown-heading-indented     |  6 ++++++
 t/t4018/markdown-heading-non-headings | 17 +++++++++++++++++
 userdiff.c                            |  3 +++
 5 files changed, 29 insertions(+)
 create mode 100644 t/t4018/markdown-heading-indented
 create mode 100644 t/t4018/markdown-heading-non-headings

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 508fe713c..2d0a03715 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -824,6 +824,8 @@ patterns are available:
 
 - `java` suitable for source code in the Java language.
 
+- `markdown` suitable for Markdown documents.
+
 - `matlab` suitable for source code in the MATLAB and Octave languages.
 
 - `objc` suitable for source code in the Objective-C language.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 02255a08b..9d0779757 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -38,6 +38,7 @@ diffpatterns="
 	golang
 	html
 	java
+	markdown
 	matlab
 	objc
 	pascal
diff --git a/t/t4018/markdown-heading-indented b/t/t4018/markdown-heading-indented
new file mode 100644
index 000000000..1991c2bd4
--- /dev/null
+++ b/t/t4018/markdown-heading-indented
@@ -0,0 +1,6 @@
+Indented headings are allowed, as long as the indent is no more than 3 spaces.
+
+   ### RIGHT
+
+- something
+- ChangeMe
diff --git a/t/t4018/markdown-heading-non-headings b/t/t4018/markdown-heading-non-headings
new file mode 100644
index 000000000..c479c1a3f
--- /dev/null
+++ b/t/t4018/markdown-heading-non-headings
@@ -0,0 +1,17 @@
+Headings can be right next to other lines of the file:
+# RIGHT
+Indents of four or more spaces make a code block:
+
+    # code comment, not heading
+
+If there's no space after the final hash, it's not a heading:
+
+#hashtag
+
+Sequences of more than 6 hashes don't make a heading:
+
+####### over-enthusiastic heading
+
+So the detected heading should be right up at the start of this file.
+
+ChangeMe
diff --git a/userdiff.c b/userdiff.c
index efbe05e5a..069a8284c 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -79,6 +79,9 @@ PATTERNS("java",
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
+PATTERNS("markdown",
+	 "^ {0,3}#{1,6}[ \t].*",
+	 "[^<>= \t]+"),
 PATTERNS("matlab",
 	 /*
 	  * Octave pattern is mostly the same as matlab, except that '%%%' and
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] userdiff: support Markdown
  2020-05-02 13:15 ` [PATCH v3] " Ash Holland
@ 2020-05-02 13:58   ` Johannes Sixt
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Sixt @ 2020-05-02 13:58 UTC (permalink / raw)
  To: Ash Holland; +Cc: git, Junio C Hamano, Boxuan Li, Alban Gruin

Am 02.05.20 um 15:15 schrieb Ash Holland:
> It's typical to find Markdown documentation alongside source code, and
> having better context for documentation changes is useful; see also
> commit 69f9c87d4 (userdiff: add support for Fountain documents,
> 2015-07-21).
> 
> The pattern is based on the CommonMark specification 0.29, section 4.2
> <https://spec.commonmark.org/> but doesn't match empty headings, as
> seeing them in a hunk header is unlikely to be useful.
> 
> Only ATX headings are supported, as detecting setext headings would
> require printing the line before a pattern matches, or matching a
> multiline pattern. The word-diff pattern is the same as the pattern for
> HTML, because many Markdown parsers accept inline HTML.
> 
> Signed-off-by: Ash Holland <ash@sorrel.sh>
> ---
>  Documentation/gitattributes.txt       |  2 ++
>  t/t4018-diff-funcname.sh              |  1 +
>  t/t4018/markdown-heading-indented     |  6 ++++++
>  t/t4018/markdown-heading-non-headings | 17 +++++++++++++++++
>  userdiff.c                            |  3 +++
>  5 files changed, 29 insertions(+)
>  create mode 100644 t/t4018/markdown-heading-indented
>  create mode 100644 t/t4018/markdown-heading-non-headings
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 508fe713c..2d0a03715 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -824,6 +824,8 @@ patterns are available:
>  
>  - `java` suitable for source code in the Java language.
>  
> +- `markdown` suitable for Markdown documents.
> +
>  - `matlab` suitable for source code in the MATLAB and Octave languages.
>  
>  - `objc` suitable for source code in the Objective-C language.
> diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
> index 02255a08b..9d0779757 100755
> --- a/t/t4018-diff-funcname.sh
> +++ b/t/t4018-diff-funcname.sh
> @@ -38,6 +38,7 @@ diffpatterns="
>  	golang
>  	html
>  	java
> +	markdown
>  	matlab
>  	objc
>  	pascal
> diff --git a/t/t4018/markdown-heading-indented b/t/t4018/markdown-heading-indented
> new file mode 100644
> index 000000000..1991c2bd4
> --- /dev/null
> +++ b/t/t4018/markdown-heading-indented
> @@ -0,0 +1,6 @@
> +Indented headings are allowed, as long as the indent is no more than 3 spaces.
> +
> +   ### RIGHT
> +
> +- something
> +- ChangeMe
> diff --git a/t/t4018/markdown-heading-non-headings b/t/t4018/markdown-heading-non-headings
> new file mode 100644
> index 000000000..c479c1a3f
> --- /dev/null
> +++ b/t/t4018/markdown-heading-non-headings
> @@ -0,0 +1,17 @@
> +Headings can be right next to other lines of the file:
> +# RIGHT
> +Indents of four or more spaces make a code block:
> +
> +    # code comment, not heading
> +
> +If there's no space after the final hash, it's not a heading:
> +
> +#hashtag
> +
> +Sequences of more than 6 hashes don't make a heading:
> +
> +####### over-enthusiastic heading
> +
> +So the detected heading should be right up at the start of this file.
> +
> +ChangeMe
> diff --git a/userdiff.c b/userdiff.c
> index efbe05e5a..069a8284c 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -79,6 +79,9 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +PATTERNS("markdown",
> +	 "^ {0,3}#{1,6}[ \t].*",
> +	 "[^<>= \t]+"),
>  PATTERNS("matlab",
>  	 /*
>  	  * Octave pattern is mostly the same as matlab, except that '%%%' and
> 

I tested this patch, and it looks good:

Acked-by: Johannes Sixt <j6t@kdbg.org>

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-05-02 13:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21  1:00 [PATCH] userdiff: support Markdown Ash Holland
2020-04-21  2:22 ` Emma Brooks
2020-04-23 23:32   ` Ash Holland
2020-04-28 21:57     ` Junio C Hamano
2020-04-29 12:12       ` Ash Holland
2020-04-23 18:17 ` Johannes Sixt
2020-04-23 23:42   ` Ash Holland
2020-04-24 17:21     ` Johannes Sixt
2020-04-29 12:21       ` Ash Holland
2020-04-29 23:05 ` [PATCH v2] " Ash Holland
2020-04-30 17:31   ` Junio C Hamano
2020-05-01 11:49     ` Ash Holland
2020-05-01 14:26       ` Johannes Sixt
2020-05-02 13:15 ` [PATCH v3] " Ash Holland
2020-05-02 13:58   ` Johannes Sixt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).