git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH] RFC: userdiff: add built-in pattern for rust
@ 2019-05-15 18:34 marcandre.lureau
  2019-05-16 20:29 ` Johannes Sixt
  0 siblings, 1 reply; 5+ messages in thread
From: marcandre.lureau @ 2019-05-15 18:34 UTC (permalink / raw)
  To: git; +Cc: Marc-André Lureau, Marc-André Lureau

From: Marc-André Lureau <mlureau@redhat.com>

This adds xfuncname and word_regex patterns for Rust, a quite
popular programming language. It also includes test cases for the
xfuncname regex (t4018) and updated documentation.

The word_regex pattern finds identifiers, integers, floats and
operators, according to the Rust Reference Book.

RFC: since I don't understand why when there are extra lines such as the
one with FIXME, the funcname is not correctly reported. Help welcome!

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 Documentation/gitattributes.txt | 2 ++
 t/t4018-diff-funcname.sh        | 1 +
 t/t4018/rust-fn                 | 5 +++++
 t/t4018/rust-struct             | 5 +++++
 t/t4018/rust-trait              | 5 +++++
 userdiff.c                      | 9 +++++++++
 6 files changed, 27 insertions(+)
 create mode 100644 t/t4018/rust-fn
 create mode 100644 t/t4018/rust-struct
 create mode 100644 t/t4018/rust-trait

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 4fb20cd0e9..07da08fb27 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -833,6 +833,8 @@ patterns are available:
 
 - `ruby` suitable for source code in the Ruby language.
 
+- `rust` suitable for source code in the Rust language.
+
 - `tex` suitable for source code for LaTeX documents.
 
 
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 22f9f88f0a..9261d6d3a0 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -43,6 +43,7 @@ diffpatterns="
 	php
 	python
 	ruby
+	rust
 	tex
 	custom1
 	custom2
diff --git a/t/t4018/rust-fn b/t/t4018/rust-fn
new file mode 100644
index 0000000000..f450590d6c
--- /dev/null
+++ b/t/t4018/rust-fn
@@ -0,0 +1,5 @@
+pub(self) fn RIGHT<T>(x: &[T]) where T: Debug {
+    let _ = x;
+    // FIXME: extra lines break match?
+    let a = ChangeMe;
+}
diff --git a/t/t4018/rust-struct b/t/t4018/rust-struct
new file mode 100644
index 0000000000..76aff1c0d8
--- /dev/null
+++ b/t/t4018/rust-struct
@@ -0,0 +1,5 @@
+#[derive(Debug)]
+pub(super) struct RIGHT<'a> {
+    name: &'a str,
+    age: ChangeMe,
+}
diff --git a/t/t4018/rust-trait b/t/t4018/rust-trait
new file mode 100644
index 0000000000..ea397f09ed
--- /dev/null
+++ b/t/t4018/rust-trait
@@ -0,0 +1,5 @@
+unsafe trait RIGHT<T> {
+    fn len(&self) -> u32;
+    fn ChangeMe(&self, n: u32) -> T;
+    fn iter<F>(&self, f: F) where F: Fn(T);
+}
diff --git a/userdiff.c b/userdiff.c
index 3a78fbf504..9e1e2fa03f 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -130,6 +130,15 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
 	 "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
 	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
+PATTERNS("rust",
+	 "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?(struct|enum|union|mod)[ \t].*)$\n"
+	 "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?(unsafe[\t ]+)?trait[ \t].*)$\n"
+	 "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?((const|unsafe|extern(([\t ]+)*\"[^)]+\")?)[\t ]+)*fn[ \t].*)$\n",
+	 /* -- */
+	 "[a-zA-Z_][a-zA-Z0-9_]*"
+	 "|[-+_0-9.eE]+(f32|f64|u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
+	 "|0[box]?[0-9a-fA-F_]+(u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
+	 "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
 PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 "[={}\"]|[^={}\" \t]+"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",

base-commit: ab15ad1a3b4b04a29415aef8c9afa2f64fc194a2
-- 
2.22.0.rc0.1.g4f1097ba08


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] RFC: userdiff: add built-in pattern for rust
  2019-05-15 18:34 [PATCH] RFC: userdiff: add built-in pattern for rust marcandre.lureau
@ 2019-05-16 20:29 ` Johannes Sixt
  2019-05-16 20:46   ` Johannes Sixt
  2019-05-16 22:17   ` Marc-André Lureau
  0 siblings, 2 replies; 5+ messages in thread
From: Johannes Sixt @ 2019-05-16 20:29 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: git, Marc-André Lureau

Am 15.05.19 um 20:34 schrieb marcandre.lureau@redhat.com:
> From: Marc-André Lureau <mlureau@redhat.com>
> 
> This adds xfuncname and word_regex patterns for Rust, a quite
> popular programming language. It also includes test cases for the
> xfuncname regex (t4018) and updated documentation.
> 
> The word_regex pattern finds identifiers, integers, floats and
> operators, according to the Rust Reference Book.
> 
> RFC: since I don't understand why when there are extra lines such as the
> one with FIXME, the funcname is not correctly reported. Help welcome!
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  Documentation/gitattributes.txt | 2 ++
>  t/t4018-diff-funcname.sh        | 1 +
>  t/t4018/rust-fn                 | 5 +++++
>  t/t4018/rust-struct             | 5 +++++
>  t/t4018/rust-trait              | 5 +++++

Nice to see tests!

> diff --git a/userdiff.c b/userdiff.c
> index 3a78fbf504..9e1e2fa03f 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -130,6 +130,15 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
>  	 "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
>  	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
>  	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
> +PATTERNS("rust",
> +	 "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?(struct|enum|union|mod)[ \t].*)$\n"
> +	 "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?(unsafe[\t ]+)?trait[ \t].*)$\n"
> +	 "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?((const|unsafe|extern(([\t ]+)*\"[^)]+\")?)[\t ]+)*fn[ \t].*)$\n",

The last \n there is the reason for the test failures: it adds an empty
pattern that matches everywhere and does not capture any text.

Can we simplify these patterns as in

   ^
   space*
   ( pub ( "(" stuff ")" )? space* )?
   ( struct|enum|union|mod|unsafe|trait|const|extern|fn )
   stuff
   $

You don't have to check for a correct syntax rigorously because you can
assume that only correct Rust code will be passed to the patterns.

> +	 /* -- */
> +	 "[a-zA-Z_][a-zA-Z0-9_]*"
> +	 "|[-+_0-9.eE]+(f32|f64|u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"

I assume that

       +e_1.ei8-e_2.eu128

is correct syntax, but not a single token. Yet, your number pattern
would take it as a single word.

> +	 "|0[box]?[0-9a-fA-F_]+(u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"

You should really subsume your number patterns under a single pattern
that requires an initial digit, because you can again assume that only
correct syntax will be shown to the patterns:

	"|[0-9][0-9_a-fA-Fuisxz]*([.][0-9]*([eE][+-]?[0-9]+)?)?"

(very likely, I have mistaken the meaning of f32 and f64 here).

> +	 "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
>  PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
>  	 "[={}\"]|[^={}\" \t]+"),
>  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
> 
> base-commit: ab15ad1a3b4b04a29415aef8c9afa2f64fc194a2
> 

-- Hannes

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] RFC: userdiff: add built-in pattern for rust
  2019-05-16 20:29 ` Johannes Sixt
@ 2019-05-16 20:46   ` Johannes Sixt
  2019-05-16 22:17   ` Marc-André Lureau
  1 sibling, 0 replies; 5+ messages in thread
From: Johannes Sixt @ 2019-05-16 20:46 UTC (permalink / raw)
  To: marcandre.lureau; +Cc: git, Marc-André Lureau

Am 16.05.19 um 22:29 schrieb Johannes Sixt:
> Am 15.05.19 um 20:34 schrieb marcandre.lureau@redhat.com:
>> +	 "[a-zA-Z_][a-zA-Z0-9_]*"
>> +	 "|[-+_0-9.eE]+(f32|f64|u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
> 
> I assume that
> 
>        +e_1.ei8-e_2.eu128

Make that
	+e_1.e_8-e_2.eu128

> is correct syntax, but not a single token. Yet, your number pattern
> would take it as a single word.

-- Hannes

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] RFC: userdiff: add built-in pattern for rust
  2019-05-16 20:29 ` Johannes Sixt
  2019-05-16 20:46   ` Johannes Sixt
@ 2019-05-16 22:17   ` Marc-André Lureau
  2019-05-16 22:36     ` Marc-André Lureau
  1 sibling, 1 reply; 5+ messages in thread
From: Marc-André Lureau @ 2019-05-16 22:17 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Hi

On Thu, May 16, 2019 at 10:29 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 15.05.19 um 20:34 schrieb marcandre.lureau@redhat.com:
> > From: Marc-André Lureau <mlureau@redhat.com>
> >
> > This adds xfuncname and word_regex patterns for Rust, a quite
> > popular programming language. It also includes test cases for the
> > xfuncname regex (t4018) and updated documentation.
> >
> > The word_regex pattern finds identifiers, integers, floats and
> > operators, according to the Rust Reference Book.
> >
> > RFC: since I don't understand why when there are extra lines such as the
> > one with FIXME, the funcname is not correctly reported. Help welcome!
> >
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > ---
> >  Documentation/gitattributes.txt | 2 ++
> >  t/t4018-diff-funcname.sh        | 1 +
> >  t/t4018/rust-fn                 | 5 +++++
> >  t/t4018/rust-struct             | 5 +++++
> >  t/t4018/rust-trait              | 5 +++++
>
> Nice to see tests!
>
> > diff --git a/userdiff.c b/userdiff.c
> > index 3a78fbf504..9e1e2fa03f 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -130,6 +130,15 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> >        "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
> >        "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
> >        "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
> > +PATTERNS("rust",
> > +      "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?(struct|enum|union|mod)[ \t].*)$\n"
> > +      "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?(unsafe[\t ]+)?trait[ \t].*)$\n"
> > +      "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?((const|unsafe|extern(([\t ]+)*\"[^)]+\")?)[\t ]+)*fn[ \t].*)$\n",
>
> The last \n there is the reason for the test failures: it adds an empty
> pattern that matches everywhere and does not capture any text.

Oops, thanks!

>
> Can we simplify these patterns as in
>
>    ^
>    space*
>    ( pub ( "(" stuff ")" )? space* )?
>    ( struct|enum|union|mod|unsafe|trait|const|extern|fn )
>    stuff
>    $
>
> You don't have to check for a correct syntax rigorously because you can
> assume that only correct Rust code will be passed to the patterns.

yes, but with

extern ( space* '"' stuff '"' )?

I'll try that

>
> > +      /* -- */
> > +      "[a-zA-Z_][a-zA-Z0-9_]*"
> > +      "|[-+_0-9.eE]+(f32|f64|u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
>
> I assume that
>
>        +e_1.ei8-e_2.eu128
>
> is correct syntax, but not a single token. Yet, your number pattern
> would take it as a single word.
>
> > +      "|0[box]?[0-9a-fA-F_]+(u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
>
> You should really subsume your number patterns under a single pattern
> that requires an initial digit, because you can again assume that only
> correct syntax will be shown to the patterns:
>
>         "|[0-9][0-9_a-fA-Fuisxz]*([.][0-9]*([eE][+-]?[0-9]+)?)?"
>
> (very likely, I have mistaken the meaning of f32 and f64 here).

That doesn't capture 0o70, easy to fix.

Then it doesn't capture the examples from the reference manual:
123.0f64;
0.1f64;
0.1f32;
12E+99_f64;

Thanks for your help!

>
> > +      "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
> >  PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
> >        "[={}\"]|[^={}\" \t]+"),
> >  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
> >
> > base-commit: ab15ad1a3b4b04a29415aef8c9afa2f64fc194a2
> >
>
> -- Hannes

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] RFC: userdiff: add built-in pattern for rust
  2019-05-16 22:17   ` Marc-André Lureau
@ 2019-05-16 22:36     ` Marc-André Lureau
  0 siblings, 0 replies; 5+ messages in thread
From: Marc-André Lureau @ 2019-05-16 22:36 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Hi

On Fri, May 17, 2019 at 12:17 AM Marc-André Lureau
<marcandre.lureau@redhat.com> wrote:
>
> Hi
>
> On Thu, May 16, 2019 at 10:29 PM Johannes Sixt <j6t@kdbg.org> wrote:
> >
> > Am 15.05.19 um 20:34 schrieb marcandre.lureau@redhat.com:
> > > From: Marc-André Lureau <mlureau@redhat.com>
> > >
> > > This adds xfuncname and word_regex patterns for Rust, a quite
> > > popular programming language. It also includes test cases for the
> > > xfuncname regex (t4018) and updated documentation.
> > >
> > > The word_regex pattern finds identifiers, integers, floats and
> > > operators, according to the Rust Reference Book.
> > >
> > > RFC: since I don't understand why when there are extra lines such as the
> > > one with FIXME, the funcname is not correctly reported. Help welcome!
> > >
> > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > ---
> > >  Documentation/gitattributes.txt | 2 ++
> > >  t/t4018-diff-funcname.sh        | 1 +
> > >  t/t4018/rust-fn                 | 5 +++++
> > >  t/t4018/rust-struct             | 5 +++++
> > >  t/t4018/rust-trait              | 5 +++++
> >
> > Nice to see tests!
> >
> > > diff --git a/userdiff.c b/userdiff.c
> > > index 3a78fbf504..9e1e2fa03f 100644
> > > --- a/userdiff.c
> > > +++ b/userdiff.c
> > > @@ -130,6 +130,15 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> > >        "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
> > >        "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
> > >        "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
> > > +PATTERNS("rust",
> > > +      "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?(struct|enum|union|mod)[ \t].*)$\n"
> > > +      "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?(unsafe[\t ]+)?trait[ \t].*)$\n"
> > > +      "^[\t ]*(((pub|pub\\([^)]+\\))[\t ]+)?((const|unsafe|extern(([\t ]+)*\"[^)]+\")?)[\t ]+)*fn[ \t].*)$\n",
> >
> > The last \n there is the reason for the test failures: it adds an empty
> > pattern that matches everywhere and does not capture any text.
>
> Oops, thanks!
>
> >
> > Can we simplify these patterns as in
> >
> >    ^
> >    space*
> >    ( pub ( "(" stuff ")" )? space* )?
> >    ( struct|enum|union|mod|unsafe|trait|const|extern|fn )
> >    stuff
> >    $
> >
> > You don't have to check for a correct syntax rigorously because you can
> > assume that only correct Rust code will be passed to the patterns.
>
> yes, but with
>
> extern ( space* '"' stuff '"' )?
>
> I'll try that
>

Or do you want to capture any line with "extern..." or "unsafe..." ?

That's a bit too much I think, in particular, with unsafe, which is
commonly used with a simple block.

So perhaps this instead?:

[\t ]*((pub(\([^)]+\))[\t ]+)?((const|unsafe|extern([\t
]+\"[^\"]+\"))[\t ]+)?(struct|enum|union|mod|trait|fn)[ \t].*)$


> >
> > > +      /* -- */
> > > +      "[a-zA-Z_][a-zA-Z0-9_]*"
> > > +      "|[-+_0-9.eE]+(f32|f64|u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
> >
> > I assume that
> >
> >        +e_1.ei8-e_2.eu128
> >
> > is correct syntax, but not a single token. Yet, your number pattern
> > would take it as a single word.
> >
> > > +      "|0[box]?[0-9a-fA-F_]+(u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"
> >
> > You should really subsume your number patterns under a single pattern
> > that requires an initial digit, because you can again assume that only
> > correct syntax will be shown to the patterns:
> >
> >         "|[0-9][0-9_a-fA-Fuisxz]*([.][0-9]*([eE][+-]?[0-9]+)?)?"
> >
> > (very likely, I have mistaken the meaning of f32 and f64 here).
>
> That doesn't capture 0o70, easy to fix.
>
> Then it doesn't capture the examples from the reference manual:
> 123.0f64;
> 0.1f64;
> 0.1f32;
> 12E+99_f64;
>
> Thanks for your help!
>
> >
> > > +      "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
> > >  PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
> > >        "[={}\"]|[^={}\" \t]+"),
> > >  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
> > >
> > > base-commit: ab15ad1a3b4b04a29415aef8c9afa2f64fc194a2
> > >
> >
> > -- Hannes

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-15 18:34 [PATCH] RFC: userdiff: add built-in pattern for rust marcandre.lureau
2019-05-16 20:29 ` Johannes Sixt
2019-05-16 20:46   ` Johannes Sixt
2019-05-16 22:17   ` Marc-André Lureau
2019-05-16 22:36     ` Marc-André Lureau

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox