git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git grep --show-function treats GOTO labels as function names
@ 2020-05-27 22:29 Zach Riggle
  2020-05-27 22:48 ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Zach Riggle @ 2020-05-27 22:29 UTC (permalink / raw)
  To: git

Git grep is an amazing tool, and --show-function makes it very easy to
create reports that contain not only line information, but the
containing function as well.

It looks like there is an issue with how the parser handles "goto"
labels, as it treats them the same as a function name.

In the example below, there is only one function, with two matches,
but using --show-function indicates there are two function names.

$ cat test2.cpp
int main() {
    FOO
exit:
    FOO
}

$ git --version
git version 2.26.2

$ git grep --no-index --show-function -e FOO test2.cpp
test2.cpp=1=int main() {
test2.cpp:2:    FOO
test2.cpp=3=exit:
test2.cpp:4:    FOO


Zach Riggle

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git grep --show-function treats GOTO labels as function names
  2020-05-27 22:29 git grep --show-function treats GOTO labels as function names Zach Riggle
@ 2020-05-27 22:48 ` Jeff King
  2020-05-27 22:54   ` Zach Riggle
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2020-05-27 22:48 UTC (permalink / raw)
  To: Zach Riggle; +Cc: git

On Wed, May 27, 2020 at 05:29:08PM -0500, Zach Riggle wrote:

> It looks like there is an issue with how the parser handles "goto"
> labels, as it treats them the same as a function name.

By default, the function-finding isn't aware of the specific content in
the file. But you can associate extensions with particular types, like:

  $ echo '*.cpp diff=cpp' >~/.gitattributes
  $ git config --global core.attributesFile ~/.gitattributes
  $ git grep --no-index --show-function -e FOO test2.cpp
  test2.cpp=int main() {
  test2.cpp:    FOO
  test2.cpp:    FOO

Usually this is done in-repo, but since your example used --no-index, I
showed how to set up a per-user attribute file. The "diff" attribute
covers both diff and grep (for diff, the hunk headers will also show the
function).

The "cpp" diff regexes are built-in to the git binary. We just don't
associate any filenames by default. You can also add your own; see the
section "Defining a custom hunk-header" from "git help attributes".

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git grep --show-function treats GOTO labels as function names
  2020-05-27 22:48 ` Jeff King
@ 2020-05-27 22:54   ` Zach Riggle
  2020-05-27 23:04     ` Zach Riggle
  0 siblings, 1 reply; 7+ messages in thread
From: Zach Riggle @ 2020-05-27 22:54 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Awesome, thanks!


Zach Riggle

On Wed, May 27, 2020 at 5:48 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, May 27, 2020 at 05:29:08PM -0500, Zach Riggle wrote:
>
> > It looks like there is an issue with how the parser handles "goto"
> > labels, as it treats them the same as a function name.
>
> By default, the function-finding isn't aware of the specific content in
> the file. But you can associate extensions with particular types, like:
>
>   $ echo '*.cpp diff=cpp' >~/.gitattributes
>   $ git config --global core.attributesFile ~/.gitattributes
>   $ git grep --no-index --show-function -e FOO test2.cpp
>   test2.cpp=int main() {
>   test2.cpp:    FOO
>   test2.cpp:    FOO
>
> Usually this is done in-repo, but since your example used --no-index, I
> showed how to set up a per-user attribute file. The "diff" attribute
> covers both diff and grep (for diff, the hunk headers will also show the
> function).
>
> The "cpp" diff regexes are built-in to the git binary. We just don't
> associate any filenames by default. You can also add your own; see the
> section "Defining a custom hunk-header" from "git help attributes".
>
> -Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git grep --show-function treats GOTO labels as function names
  2020-05-27 22:54   ` Zach Riggle
@ 2020-05-27 23:04     ` Zach Riggle
  2020-05-27 23:05       ` Zach Riggle
  2020-05-27 23:16       ` Jeff King
  0 siblings, 2 replies; 7+ messages in thread
From: Zach Riggle @ 2020-05-27 23:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git

It looks like that does the trick for "goto" labels, but there are
also some issue on function name parsing with attributes when they are
split onto a second line.

$ cat attr.cpp
int main() __attribute__ ( (no_sanitize("alignment")) )
{
    FOO
}
$ git grep --no-index --show-function -e FOO attr.cpp
attr.cpp=2=__attribute__ ( (no_sanitize("alignment")) )
attr.cpp:4:    FOO

Zach Riggle

On Wed, May 27, 2020 at 5:54 PM Zach Riggle <zachriggle@gmail.com> wrote:
>
> Awesome, thanks!
>
>
> Zach Riggle
>
> On Wed, May 27, 2020 at 5:48 PM Jeff King <peff@peff.net> wrote:
> >
> > On Wed, May 27, 2020 at 05:29:08PM -0500, Zach Riggle wrote:
> >
> > > It looks like there is an issue with how the parser handles "goto"
> > > labels, as it treats them the same as a function name.
> >
> > By default, the function-finding isn't aware of the specific content in
> > the file. But you can associate extensions with particular types, like:
> >
> >   $ echo '*.cpp diff=cpp' >~/.gitattributes
> >   $ git config --global core.attributesFile ~/.gitattributes
> >   $ git grep --no-index --show-function -e FOO test2.cpp
> >   test2.cpp=int main() {
> >   test2.cpp:    FOO
> >   test2.cpp:    FOO
> >
> > Usually this is done in-repo, but since your example used --no-index, I
> > showed how to set up a per-user attribute file. The "diff" attribute
> > covers both diff and grep (for diff, the hunk headers will also show the
> > function).
> >
> > The "cpp" diff regexes are built-in to the git binary. We just don't
> > associate any filenames by default. You can also add your own; see the
> > section "Defining a custom hunk-header" from "git help attributes".
> >
> > -Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git grep --show-function treats GOTO labels as function names
  2020-05-27 23:04     ` Zach Riggle
@ 2020-05-27 23:05       ` Zach Riggle
  2020-05-27 23:16       ` Jeff King
  1 sibling, 0 replies; 7+ messages in thread
From: Zach Riggle @ 2020-05-27 23:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git

I made a mistake copy-pasting; things work correctly when the
attribute is on the same line, but not when it is split:

$ cat attr.cpp
int main()
__attribute__ ( (no_sanitize("alignment")) )
{
    FOO
}

Zach Riggle

On Wed, May 27, 2020 at 6:04 PM Zach Riggle <zachriggle@gmail.com> wrote:
>
> It looks like that does the trick for "goto" labels, but there are
> also some issue on function name parsing with attributes when they are
> split onto a second line.
>
> $ cat attr.cpp
> int main() __attribute__ ( (no_sanitize("alignment")) )
> {
>     FOO
> }
> $ git grep --no-index --show-function -e FOO attr.cpp
> attr.cpp=2=__attribute__ ( (no_sanitize("alignment")) )
> attr.cpp:4:    FOO
>
> Zach Riggle
>
> On Wed, May 27, 2020 at 5:54 PM Zach Riggle <zachriggle@gmail.com> wrote:
> >
> > Awesome, thanks!
> >
> >
> > Zach Riggle
> >
> > On Wed, May 27, 2020 at 5:48 PM Jeff King <peff@peff.net> wrote:
> > >
> > > On Wed, May 27, 2020 at 05:29:08PM -0500, Zach Riggle wrote:
> > >
> > > > It looks like there is an issue with how the parser handles "goto"
> > > > labels, as it treats them the same as a function name.
> > >
> > > By default, the function-finding isn't aware of the specific content in
> > > the file. But you can associate extensions with particular types, like:
> > >
> > >   $ echo '*.cpp diff=cpp' >~/.gitattributes
> > >   $ git config --global core.attributesFile ~/.gitattributes
> > >   $ git grep --no-index --show-function -e FOO test2.cpp
> > >   test2.cpp=int main() {
> > >   test2.cpp:    FOO
> > >   test2.cpp:    FOO
> > >
> > > Usually this is done in-repo, but since your example used --no-index, I
> > > showed how to set up a per-user attribute file. The "diff" attribute
> > > covers both diff and grep (for diff, the hunk headers will also show the
> > > function).
> > >
> > > The "cpp" diff regexes are built-in to the git binary. We just don't
> > > associate any filenames by default. You can also add your own; see the
> > > section "Defining a custom hunk-header" from "git help attributes".
> > >
> > > -Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git grep --show-function treats GOTO labels as function names
  2020-05-27 23:04     ` Zach Riggle
  2020-05-27 23:05       ` Zach Riggle
@ 2020-05-27 23:16       ` Jeff King
  2020-05-28 19:01         ` Johannes Sixt
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2020-05-27 23:16 UTC (permalink / raw)
  To: Zach Riggle; +Cc: git

On Wed, May 27, 2020 at 06:04:21PM -0500, Zach Riggle wrote:

> It looks like that does the trick for "goto" labels, but there are
> also some issue on function name parsing with attributes when they are
> split onto a second line.
> 
> $ cat attr.cpp
> int main() __attribute__ ( (no_sanitize("alignment")) )
> {
>     FOO
> }
> $ git grep --no-index --show-function -e FOO attr.cpp
> attr.cpp=2=__attribute__ ( (no_sanitize("alignment")) )
> attr.cpp:4:    FOO

From your output, I assume the problematic input actually splits the
attribute onto the second line?

I agree that's not ideal. The baked-in regex we use for matching C
function lines is:

  $ git grep -nA4 cpp userdiff.c
  userdiff.c:173:PATTERNS("cpp",
  userdiff.c-174-  /* Jump targets or access declarations */
  userdiff.c-175-  "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])\n"
  userdiff.c-176-  /* functions/methods, variables, and compounds at top level */
  userdiff.c-177-  "^((::[[:space:]]*)?[A-Za-z_].*)$",

so we mistake it for a function name. I'm not sure how easy it is to do
better, though. We can add a line like:

diff --git a/userdiff.c b/userdiff.c
index 1df884ef0b..de8e1a3d72 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -173,6 +173,8 @@ PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
 PATTERNS("cpp",
 	 /* Jump targets or access declarations */
 	 "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])\n"
+	 /* skip over attribute declarations are on their own lines */
+	 "!((::[[:space:]]*))?__attribute__\n"
 	 /* functions/methods, variables, and compounds at top level */
 	 "^((::[[:space:]]*)?[A-Za-z_].*)$",
 	 /* -- */

which works for your case, but would regress:

  __attribute__((whatever) int main()
  {
  FOO
  }

Handling both means skipping past the attribute, not counting it as a
function, and then checking for a plausible function afterwards. That's
a much trickier regex. But if you come up with something that works, I
think we'd be happy to take a patch. :)

You can also just override this regex via config for your personal use.
If you know you'd never use that style, then the problem becomes much
easier.

-Peff

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: git grep --show-function treats GOTO labels as function names
  2020-05-27 23:16       ` Jeff King
@ 2020-05-28 19:01         ` Johannes Sixt
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Sixt @ 2020-05-28 19:01 UTC (permalink / raw)
  To: Zach Riggle; +Cc: Jeff King, git

Am 28.05.20 um 01:16 schrieb Jeff King:
> On Wed, May 27, 2020 at 06:04:21PM -0500, Zach Riggle wrote:
>> It looks like that does the trick for "goto" labels, but there are
>> also some issue on function name parsing with attributes when they are
>> split onto a second line.
>>
>> $ cat attr.cpp
>> int main() __attribute__ ( (no_sanitize("alignment")) )
>> {
>>     FOO
>> }
>> $ git grep --no-index --show-function -e FOO attr.cpp
>> attr.cpp=2=__attribute__ ( (no_sanitize("alignment")) )
>> attr.cpp:4:    FOO
> 
> From your output, I assume the problematic input actually splits the
> attribute onto the second line?
> 
> I agree that's not ideal. The baked-in regex we use for matching C
> function lines is:
> 
>   $ git grep -nA4 cpp userdiff.c
>   userdiff.c:173:PATTERNS("cpp",
>   userdiff.c-174-  /* Jump targets or access declarations */
>   userdiff.c-175-  "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])\n"
>   userdiff.c-176-  /* functions/methods, variables, and compounds at top level */
>   userdiff.c-177-  "^((::[[:space:]]*)?[A-Za-z_].*)$",
> 
> so we mistake it for a function name. I'm not sure how easy it is to do
> better, though. We can add a line like:

C and C++ have a very versatile syntax and it turned out to be virtually
impossible to capture actual function definitions with a regex. See
8a2e8da367f7 ("userdiff: have 'cpp' hunk header pattern catch more C++
anchor points", 2014-03-21) for details.

With the current pattern, we catch probably 95% of the coding patterns
and coding styles. The style under discussion falls into the remaining
5% because continuation lines are not indented.

-- Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-05-28 19:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 22:29 git grep --show-function treats GOTO labels as function names Zach Riggle
2020-05-27 22:48 ` Jeff King
2020-05-27 22:54   ` Zach Riggle
2020-05-27 23:04     ` Zach Riggle
2020-05-27 23:05       ` Zach Riggle
2020-05-27 23:16       ` Jeff King
2020-05-28 19:01         ` Johannes Sixt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).