git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* [PATCH v3] userdiff: add built-in pattern for rust
@ 2019-05-20 17:04 marcandre.lureau
  2019-05-20 19:52 ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: marcandre.lureau @ 2019-05-20 17:04 UTC (permalink / raw)
  To: git; +Cc: j6t, Marc-André Lureau, Marc-André Lureau

From: Marc-André Lureau <mlureau@redhat.com>

This adds xfuncname and word_regex patterns for Rust, a quite
popular programming language. It also includes test cases for the
xfuncname regex (t4018) and updated documentation.

The word_regex pattern finds identifiers, integers, floats and
operators, according to the Rust Reference Book.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 Documentation/gitattributes.txt | 2 ++
 t/t4018-diff-funcname.sh        | 1 +
 t/t4018/rust-fn                 | 5 +++++
 t/t4018/rust-impl               | 5 +++++
 t/t4018/rust-struct             | 5 +++++
 t/t4018/rust-trait              | 5 +++++
 t/t4018/rust-unsafe             | 6 ++++++
 userdiff.c                      | 6 ++++++
 8 files changed, 35 insertions(+)
 create mode 100644 t/t4018/rust-fn
 create mode 100644 t/t4018/rust-impl
 create mode 100644 t/t4018/rust-struct
 create mode 100644 t/t4018/rust-trait
 create mode 100644 t/t4018/rust-unsafe

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 4fb20cd0e9..07da08fb27 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -833,6 +833,8 @@ patterns are available:
 
 - `ruby` suitable for source code in the Ruby language.
 
+- `rust` suitable for source code in the Rust language.
+
 - `tex` suitable for source code for LaTeX documents.
 
 
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 22f9f88f0a..9261d6d3a0 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -43,6 +43,7 @@ diffpatterns="
 	php
 	python
 	ruby
+	rust
 	tex
 	custom1
 	custom2
diff --git a/t/t4018/rust-fn b/t/t4018/rust-fn
new file mode 100644
index 0000000000..cbe02155f1
--- /dev/null
+++ b/t/t4018/rust-fn
@@ -0,0 +1,5 @@
+pub(self) fn RIGHT<T>(x: &[T]) where T: Debug {
+    let _ = x;
+    // a comment
+    let a = ChangeMe;
+}
diff --git a/t/t4018/rust-impl b/t/t4018/rust-impl
new file mode 100644
index 0000000000..09df3cd93b
--- /dev/null
+++ b/t/t4018/rust-impl
@@ -0,0 +1,5 @@
+impl<'a, T: AsRef<[u8]>>  std::RIGHT for Git<'a> {
+
+    pub fn ChangeMe(&self) -> () {
+    }
+}
diff --git a/t/t4018/rust-struct b/t/t4018/rust-struct
new file mode 100644
index 0000000000..76aff1c0d8
--- /dev/null
+++ b/t/t4018/rust-struct
@@ -0,0 +1,5 @@
+#[derive(Debug)]
+pub(super) struct RIGHT<'a> {
+    name: &'a str,
+    age: ChangeMe,
+}
diff --git a/t/t4018/rust-trait b/t/t4018/rust-trait
new file mode 100644
index 0000000000..ea397f09ed
--- /dev/null
+++ b/t/t4018/rust-trait
@@ -0,0 +1,5 @@
+unsafe trait RIGHT<T> {
+    fn len(&self) -> u32;
+    fn ChangeMe(&self, n: u32) -> T;
+    fn iter<F>(&self, f: F) where F: Fn(T);
+}
diff --git a/t/t4018/rust-unsafe b/t/t4018/rust-unsafe
new file mode 100644
index 0000000000..fd4661a934
--- /dev/null
+++ b/t/t4018/rust-unsafe
@@ -0,0 +1,6 @@
+unsafe fn RIGHT(inc: u32) {
+    unsafe {
+        // don't catch unsafe block
+        ChangeMe += inc;
+    }
+}
diff --git a/userdiff.c b/userdiff.c
index 3a78fbf504..e45b5920c6 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -130,6 +130,12 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
 	 "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
 	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
+PATTERNS("rust",
+	 "^[\t ]*((pub(\\([^\\)]+\\))?[\t ]+)?((async|const|unsafe|extern([\t ]+\"[^\"]+\"))[\t ]+)?(struct|enum|union|mod|trait|fn|impl(<.+>)?)[ \t]+[^;]*)$",
+	 /* -- */
+	 "[a-zA-Z_][a-zA-Z0-9_]*"
+	 "|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?"
+	 "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
 PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 "[={}\"]|[^={}\" \t]+"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",

base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6
-- 
2.22.0.rc1.1.g079e7d2849.dirty


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] userdiff: add built-in pattern for rust
  2019-05-20 17:04 [PATCH v3] userdiff: add built-in pattern for rust marcandre.lureau
@ 2019-05-20 19:52 ` Johannes Sixt
  2019-05-21 10:57   ` Marc-André Lureau
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2019-05-20 19:52 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: git, Marc-André Lureau

Am 20.05.19 um 19:04 schrieb marcandre.lureau@redhat.com:
> From: Marc-André Lureau <mlureau@redhat.com>
> 
> This adds xfuncname and word_regex patterns for Rust, a quite
> popular programming language. It also includes test cases for the
> xfuncname regex (t4018) and updated documentation.
> 
> The word_regex pattern finds identifiers, integers, floats and
> operators, according to the Rust Reference Book.

This looks very good. I have a few questions regarding the hunk header
regex.

> diff --git a/userdiff.c b/userdiff.c
> index 3a78fbf504..e45b5920c6 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -130,6 +130,12 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
>  	 "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
>  	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
>  	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
> +PATTERNS("rust",
> +	 "^[\t ]*((pub(\\([^\\)]+\\))?[\t ]+)?((async|const|unsafe|extern([\t ]+\"[^\"]+\"))[\t ]+)?(struct|enum|union|mod|trait|fn|impl(<.+>)?)[ \t]+[^;]*)$",

This pattern matches only if there is no semicolon behind the signal
words on the line. Is that important? Can you show a (test) case where a
line with a semicolon would be picked incorrectly if '[^;]*' were
simplified to '.*'?

You permit whitespace at the beginning of an anchor line. I guess that
is to catch nested definitions. Or is it common style to write indented
code? Can you show a test case where this makes sense?

Would it be sufficient to simplify

    (struct|enum|union|mod|trait|fn|impl(<.+>)?)[ \t]+
to
    (struct|enum|union|mod|trait|fn|impl)[< \t]+

as it is only important to exclude identifiers that start with these
keywords.

> +	 /* -- */
> +	 "[a-zA-Z_][a-zA-Z0-9_]*"
> +	 "|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?"
> +	 "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
>  PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
>  	 "[={}\"]|[^={}\" \t]+"),
>  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
> 
> base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] userdiff: add built-in pattern for rust
  2019-05-20 19:52 ` Johannes Sixt
@ 2019-05-21 10:57   ` Marc-André Lureau
  2019-05-28 16:34     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Marc-André Lureau @ 2019-05-21 10:57 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Hi

On Mon, May 20, 2019 at 9:52 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 20.05.19 um 19:04 schrieb marcandre.lureau@redhat.com:
> > From: Marc-André Lureau <mlureau@redhat.com>
> >
> > This adds xfuncname and word_regex patterns for Rust, a quite
> > popular programming language. It also includes test cases for the
> > xfuncname regex (t4018) and updated documentation.
> >
> > The word_regex pattern finds identifiers, integers, floats and
> > operators, according to the Rust Reference Book.
>
> This looks very good. I have a few questions regarding the hunk header
> regex.
>
> > diff --git a/userdiff.c b/userdiff.c
> > index 3a78fbf504..e45b5920c6 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -130,6 +130,12 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> >        "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
> >        "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
> >        "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
> > +PATTERNS("rust",
> > +      "^[\t ]*((pub(\\([^\\)]+\\))?[\t ]+)?((async|const|unsafe|extern([\t ]+\"[^\"]+\"))[\t ]+)?(struct|enum|union|mod|trait|fn|impl(<.+>)?)[ \t]+[^;]*)$",
>
> This pattern matches only if there is no semicolon behind the signal
> words on the line. Is that important? Can you show a (test) case where a
> line with a semicolon would be picked incorrectly if '[^;]*' were
> simplified to '.*'?


Ok, I am adding:

trait RIGHT {

    fn new(name: &'static str) -> Self;

    fn ChangeMe(&self) {
        // should skip "new", and return trait name
    }
}

> You permit whitespace at the beginning of an anchor line. I guess that
> is to catch nested definitions. Or is it common style to write indented
> code? Can you show a test case where this makes sense?
>

sure, I thought it was already covered.

fn foo() {
    fn RIGHT() {
        // must catch nested function
        ChangeMe;
    }
}

(a simpler example would be a method implementation)

> Would it be sufficient to simplify
>
>     (struct|enum|union|mod|trait|fn|impl(<.+>)?)[ \t]+
> to
>     (struct|enum|union|mod|trait|fn|impl)[< \t]+
>
> as it is only important to exclude identifiers that start with these
> keywords.

I think that would be fine, ok I am changing it

>
> > +      /* -- */
> > +      "[a-zA-Z_][a-zA-Z0-9_]*"
> > +      "|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?"
> > +      "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
> >  PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
> >        "[={}\"]|[^={}\" \t]+"),
> >  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
> >
> > base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6
>
> -- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] userdiff: add built-in pattern for rust
  2019-05-21 10:57   ` Marc-André Lureau
@ 2019-05-28 16:34     ` Junio C Hamano
  2019-05-28 20:31       ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2019-05-28 16:34 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: Johannes Sixt, git

Marc-André Lureau <marcandre.lureau@redhat.com> writes:

> Ok, I am adding:
> ...
> sure, I thought it was already covered.
> ...
> I think that would be fine, ok I am changing it

Thanks, both.

The previous round has already hit 'next' (which means that we won't
replacing the patch wholesale), so whatever you do, please make the
update relative to / on top of what is queued as d74e7860
("userdiff: add built-in pattern for rust", 2019-05-17).



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] userdiff: add built-in pattern for rust
  2019-05-28 16:34     ` Junio C Hamano
@ 2019-05-28 20:31       ` Johannes Sixt
  2019-05-28 21:01         ` Marc-André Lureau
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2019-05-28 20:31 UTC (permalink / raw)
  To: Junio C Hamano, Marc-André Lureau; +Cc: git

Am 28.05.19 um 18:34 schrieb Junio C Hamano:
> Marc-André Lureau <marcandre.lureau@redhat.com> writes:
> 
>> Ok, I am adding:
>> ...
>> sure, I thought it was already covered.
>> ...
>> I think that would be fine, ok I am changing it
> 
> Thanks, both.
> 
> The previous round has already hit 'next' (which means that we won't
> replacing the patch wholesale), so whatever you do, please make the
> update relative to / on top of what is queued as d74e7860
> ("userdiff: add built-in pattern for rust", 2019-05-17).

Ok. So, Marc-André, would you mind resending an incremental patch,
because the word-regexp that is currently in 'next' would catch certain
expressions that should be multiple words as a single word?

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] userdiff: add built-in pattern for rust
  2019-05-28 20:31       ` Johannes Sixt
@ 2019-05-28 21:01         ` Marc-André Lureau
  2019-05-30 16:44           ` [PATCH] userdiff: two simplifications of patterns " Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Marc-André Lureau @ 2019-05-28 21:01 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git

Hi Johannes

On Tue, May 28, 2019 at 10:31 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 28.05.19 um 18:34 schrieb Junio C Hamano:
> > Marc-André Lureau <marcandre.lureau@redhat.com> writes:
> >
> >> Ok, I am adding:
> >> ...
> >> sure, I thought it was already covered.
> >> ...
> >> I think that would be fine, ok I am changing it
> >
> > Thanks, both.
> >
> > The previous round has already hit 'next' (which means that we won't
> > replacing the patch wholesale), so whatever you do, please make the
> > update relative to / on top of what is queued as d74e7860
> > ("userdiff: add built-in pattern for rust", 2019-05-17).
>
> Ok. So, Marc-André, would you mind resending an incremental patch,
> because the word-regexp that is currently in 'next' would catch certain
> expressions that should be multiple words as a single word?

Beside a few extras tests, the diff is:

@@ -134,11 +134,10 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
         "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
         "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
 PATTERNS("rust",
-        "^[\t ]*((pub(\\([^\\)]+\\))?[\t
]+)?((async|const|unsafe|extern([\t ]+\"[^\"]+\"))[\t
]+)?(struct|enum|union|mod|trait|fn|impl(<.+>)?)[ \t]+[^;]*)$",
+        "^[\t ]*((pub(\\([^\\)]+\\))?[\t
]+)?((async|const|unsafe|extern([\t ]+\"[^\"]+\"))[\t
]+)?(struct|enum|union|mod|trait|fn|impl)[< \t]+[^;]*)$",
         /* -- */
         "[a-zA-Z_][a-zA-Z0-9_]*"
-        "|[-+_0-9.eE]+(f32|f64|u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
-        "|0[box]?[0-9a-fA-F_]+(u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
+        "|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?"
         "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),

So it is simplifying handling of type parameters, and lowering the
complexity of literal numbers.

Both of these changes were based on your recommendations. Would you
mind sending a follow-up patch yourself?

I can send a seperate patch for the 3 extra tests.

thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] userdiff: two simplifications of patterns for rust
  2019-05-28 21:01         ` Marc-André Lureau
@ 2019-05-30 16:44           ` Johannes Sixt
  2019-05-30 18:59             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2019-05-30 16:44 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: Junio C Hamano, git

- Do not enforce (but assume) syntactic correctness of language
  constructs that go into hunk headers: we only want to ensure that
  the keywords actually are words and not just the initial part of
  some identifier.

- In the word regex, match numbers only when they begin with a digit,
  but then be liberal in what follows, assuming that the text that is
  matched is syntactially correct.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
Am 28.05.19 um 23:01 schrieb Marc-André Lureau:
> Both of these changes were based on your recommendations. Would you
> mind sending a follow-up patch yourself?
> 
> I can send a seperate patch for the 3 extra tests.

So, here it is. Looking forward to seeing a patch with the tests.

 userdiff.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/userdiff.c b/userdiff.c
index 8d7e62e2a5..2bcf105caf 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -131,11 +131,10 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
 	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
 PATTERNS("rust",
-	 "^[\t ]*((pub(\\([^\\)]+\\))?[\t ]+)?((async|const|unsafe|extern([\t ]+\"[^\"]+\"))[\t ]+)?(struct|enum|union|mod|trait|fn|impl(<.+>)?)[ \t]+[^;]*)$",
+	 "^[\t ]*((pub(\\([^\\)]+\\))?[\t ]+)?((async|const|unsafe|extern([\t ]+\"[^\"]+\"))[\t ]+)?(struct|enum|union|mod|trait|fn|impl)[< \t]+[^;]*)$",
 	 /* -- */
 	 "[a-zA-Z_][a-zA-Z0-9_]*"
-	 "|[-+_0-9.eE]+(f32|f64|u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
-	 "|0[box]?[0-9a-fA-F_]+(u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
+	 "|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?"
 	 "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
 PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 "[={}\"]|[^={}\" \t]+"),
-- 
2.21.0.285.gc38d92e052

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] userdiff: two simplifications of patterns for rust
  2019-05-30 16:44           ` [PATCH] userdiff: two simplifications of patterns " Johannes Sixt
@ 2019-05-30 18:59             ` Ævar Arnfjörð Bjarmason
  2019-05-30 20:32               ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-30 18:59 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Marc-André Lureau, Junio C Hamano, git


On Thu, May 30 2019, Johannes Sixt wrote:

> - Do not enforce (but assume) syntactic correctness of language
>   constructs that go into hunk headers: we only want to ensure that
>   the keywords actually are words and not just the initial part of
>   some identifier.
>
> - In the word regex, match numbers only when they begin with a digit,
>   but then be liberal in what follows, assuming that the text that is
>   matched is syntactially correct.

I don't know if this is possible for Rust (but very much suspect so...),
but I think that in general we should aim to be more forgiving than not
with these patterns.

Because, as the history of userdiff.c shows, new keywords get introduced
into these languages, and old git versions survive for a long time. If
the syntax is otherwise fairly regular perhaps we don't need to hardcode
the list of existing keywords?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] userdiff: two simplifications of patterns for rust
  2019-05-30 18:59             ` Ævar Arnfjörð Bjarmason
@ 2019-05-30 20:32               ` Johannes Sixt
  0 siblings, 0 replies; 9+ messages in thread
From: Johannes Sixt @ 2019-05-30 20:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Marc-André Lureau, Junio C Hamano, git

Am 30.05.19 um 20:59 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Thu, May 30 2019, Johannes Sixt wrote:
> 
>> - Do not enforce (but assume) syntactic correctness of language
>>   constructs that go into hunk headers: we only want to ensure that
>>   the keywords actually are words and not just the initial part of
>>   some identifier.
>>
>> - In the word regex, match numbers only when they begin with a digit,
>>   but then be liberal in what follows, assuming that the text that is
>>   matched is syntactially correct.
> 
> I don't know if this is possible for Rust (but very much suspect so...),
> but I think that in general we should aim to be more forgiving than not
> with these patterns.

The C/C++ pattern is actually very forgiving in the hunk header pattern:
It takes every line that begins with an un-indented letter. That works
very well in in C because C does not have nested functions and it is
typical that the function definition lines are not indented. But that
breaks down with C++: indented function definitions are very common;
they happen inside class and namespace definitions. Such functions are
not picked up, and we live with that so far (at least, I do).

> Because, as the history of userdiff.c shows, new keywords get introduced
> into these languages, and old git versions survive for a long time. If
> the syntax is otherwise fairly regular perhaps we don't need to hardcode
> the list of existing keywords?

We are talking about (1) hunk header lines (not something really
important) and (2) programming languages: new keywords don't pop up
every month. Granted, inventing new languages is en vogue these days.
But really, I mean, WTH?

Having available keywords to recognize hunk header candidates helps a
lot. I thought long about a possible pattern for C++, but I gave up,
because the language is so rich and there are no suitable keywords.

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-05-30 20:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-20 17:04 [PATCH v3] userdiff: add built-in pattern for rust marcandre.lureau
2019-05-20 19:52 ` Johannes Sixt
2019-05-21 10:57   ` Marc-André Lureau
2019-05-28 16:34     ` Junio C Hamano
2019-05-28 20:31       ` Johannes Sixt
2019-05-28 21:01         ` Marc-André Lureau
2019-05-30 16:44           ` [PATCH] userdiff: two simplifications of patterns " Johannes Sixt
2019-05-30 18:59             ` Ævar Arnfjörð Bjarmason
2019-05-30 20:32               ` Johannes Sixt

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git