git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* excluding a function from coccinelle transformation
@ 2018-08-24  6:42 Jeff King
  2018-08-24 11:04 ` [Cocci] " Julia Lawall
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2018-08-24  6:42 UTC (permalink / raw)
  To: git, cocci

In Git's Coccinelle patches, we sometimes want to suppress a
transformation inside a particular function. For example, in finding
conversions of hashcmp() to oidcmp(), we should not convert the call in
oidcmp() itself, since that would cause infinite recursion. We write the
semantic patch like this:

  @@
  identifier f != oidcmp;
  expression E1, E2;
  @@
    f(...) {...
  - hashcmp(E1->hash, E2->hash)
  + oidcmp(E1, E2)
    ...}

This catches some cases, but not all. For instance, there's one case in
sequencer.c which it does not convert. Now here's where it gets weird.
If I instead use the angle-bracket form of ellipses, like this:

  @@
  identifier f != oidcmp;
  expression E1, E2;
  @@
    f(...) {<...
  - hashcmp(E1->hash, E2->hash)
  + oidcmp(E1, E2)
    ...>}

then we do generate the expected diff! Here's a much more cut-down
source file that demonstrates the same behavior:

  int foo(void)
  {
    if (1)
      if (!hashcmp(x, y))
        return 1;
    return 0;
  }

If I remove the initial "if (1)" then a diff is generated with either
semantic patch (and the particulars of the "if" are not important; the
same thing happens if it's a while-loop. The key thing seems to be that
the code is not in the top-level block of the function).

And here's some double-weirdness. I get those results with spatch 1.0.4,
which is what's in Debian unstable. If I then upgrade to 1.0.6 from
Debian experimental, then _neither_ patch produces any results! Instead
I get:

  init_defs_builtins: /usr/lib/coccinelle/standard.h
  (ONCE) Expected tokens oidcmp hashcmp hash
  Skipping:foo.c

(whereas before, even the failing case said "HANDLING: foo.c").

And then one final check: I built coccinelle from the current tip of
https://github.com/coccinelle/coccinelle (1.0.7-00504-g670b2243).
With my cut-down case, that version generates a diff with either
semantic patch. But for the full-blown case in sequencer.c, it still
only works with the angle brackets.

So my questions are:

  - is this a bug in coccinelle? Or I not understand how "..." is
    supposed to work here?

    (It does seem like there was possibly a separate bug introduced in
    1.0.6 that was later fixed; we can probably ignore that and just
    focus on the behavior in the current tip of master).

  - is there a better way to represent this kind of "transform this
    everywhere _except_ in this function" semantic patch? (preferably
    one that does not tickle this bug, if it is indeed a bug ;) ).

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Cocci] excluding a function from coccinelle transformation
  2018-08-24  6:42 excluding a function from coccinelle transformation Jeff King
@ 2018-08-24 11:04 ` Julia Lawall
  2018-08-24 20:53   ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Julia Lawall @ 2018-08-24 11:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git, cocci



On Fri, 24 Aug 2018, Jeff King wrote:

> In Git's Coccinelle patches, we sometimes want to suppress a
> transformation inside a particular function. For example, in finding
> conversions of hashcmp() to oidcmp(), we should not convert the call in
> oidcmp() itself, since that would cause infinite recursion. We write the
> semantic patch like this:
>
>   @@
>   identifier f != oidcmp;
>   expression E1, E2;
>   @@
>     f(...) {...
>   - hashcmp(E1->hash, E2->hash)
>   + oidcmp(E1, E2)
>     ...}

The problem is with how how ... works.  For transformation, A ... B
requires that B occur on every execution path starting with A, unless that
execution path ends up in error handling code.
(eg, if (...) { ... return; }).  Here your A is the start if the function.
So you need a call to hashcmp on every path through the function, which
fails when you add ifs.

If you use * (searching) instead of - and + (transformation) it will only
require that a path exists.  * is mean for bug finding, where you often
want to find eg whether there exists a path that is missing a free.

If you want the exists behavior with a transformation rule, then you can
put exists at the top of the rule between the initial @@.  I don't suggest
this in general, as it can lead to inconsistencies.

What you want is what you ended up using, which is <... P ...> which
allows zero or more occurrences of P.

However, this can all be very expensive, because you are matching paths
through the function definition which you don't really care about.  All
you care about here is the name.  So another approach is

@@
position p : script:python() { p[0].current_element != "oldcmp" };
expression E1,E2;
@@

- hashcmp(E1->hash, E2->hash)
+ oidcmp(E1, E2)

(I assume that "not equals" is written != in python)

Another issue with A ... B is that by default A and B should not appear in
the matched region.  So your original rule matches only the case where
every execution path contains exactly one call to hashcmp, not more than
one.  So that was another problem with it.

julia

>
> This catches some cases, but not all. For instance, there's one case in
> sequencer.c which it does not convert. Now here's where it gets weird.
> If I instead use the angle-bracket form of ellipses, like this:
>
>   @@
>   identifier f != oidcmp;
>   expression E1, E2;
>   @@
>     f(...) {<...
>   - hashcmp(E1->hash, E2->hash)
>   + oidcmp(E1, E2)
>     ...>}
>
> then we do generate the expected diff! Here's a much more cut-down
> source file that demonstrates the same behavior:
>
>   int foo(void)
>   {
>     if (1)
>       if (!hashcmp(x, y))
>         return 1;
>     return 0;
>   }
>
> If I remove the initial "if (1)" then a diff is generated with either
> semantic patch (and the particulars of the "if" are not important; the
> same thing happens if it's a while-loop. The key thing seems to be that
> the code is not in the top-level block of the function).
>
> And here's some double-weirdness. I get those results with spatch 1.0.4,
> which is what's in Debian unstable. If I then upgrade to 1.0.6 from
> Debian experimental, then _neither_ patch produces any results! Instead
> I get:
>
>   init_defs_builtins: /usr/lib/coccinelle/standard.h
>   (ONCE) Expected tokens oidcmp hashcmp hash
>   Skipping:foo.c
>
> (whereas before, even the failing case said "HANDLING: foo.c").
>
> And then one final check: I built coccinelle from the current tip of
> https://github.com/coccinelle/coccinelle (1.0.7-00504-g670b2243).
> With my cut-down case, that version generates a diff with either
> semantic patch. But for the full-blown case in sequencer.c, it still
> only works with the angle brackets.
>
> So my questions are:
>
>   - is this a bug in coccinelle? Or I not understand how "..." is
>     supposed to work here?
>
>     (It does seem like there was possibly a separate bug introduced in
>     1.0.6 that was later fixed; we can probably ignore that and just
>     focus on the behavior in the current tip of master).
>
>   - is there a better way to represent this kind of "transform this
>     everywhere _except_ in this function" semantic patch? (preferably
>     one that does not tickle this bug, if it is indeed a bug ;) ).
>
> -Peff
> _______________________________________________
> Cocci mailing list
> Cocci@systeme.lip6.fr
> https://systeme.lip6.fr/mailman/listinfo/cocci
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Cocci] excluding a function from coccinelle transformation
  2018-08-24 11:04 ` [Cocci] " Julia Lawall
@ 2018-08-24 20:53   ` Jeff King
  2018-08-24 21:00     ` Julia Lawall
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2018-08-24 20:53 UTC (permalink / raw)
  To: Julia Lawall; +Cc: git, cocci

On Fri, Aug 24, 2018 at 07:04:27AM -0400, Julia Lawall wrote:

> On Fri, 24 Aug 2018, Jeff King wrote:
> 
> > In Git's Coccinelle patches, we sometimes want to suppress a
> > transformation inside a particular function. For example, in finding
> > conversions of hashcmp() to oidcmp(), we should not convert the call in
> > oidcmp() itself, since that would cause infinite recursion. We write the
> > semantic patch like this:
> >
> >   @@
> >   identifier f != oidcmp;
> >   expression E1, E2;
> >   @@
> >     f(...) {...
> >   - hashcmp(E1->hash, E2->hash)
> >   + oidcmp(E1, E2)
> >     ...}
> 
> The problem is with how how ... works.  For transformation, A ... B
> requires that B occur on every execution path starting with A, unless that
> execution path ends up in error handling code.
> (eg, if (...) { ... return; }).  Here your A is the start if the function.
> So you need a call to hashcmp on every path through the function, which
> fails when you add ifs.

Thank you! This explanation (and the one below about A and B not
appearing in the matched region) helped my understanding tremendously.

> What you want is what you ended up using, which is <... P ...> which
> allows zero or more occurrences of P.

And now this makes much more sense (I stumbled onto it through brute
force, but now I understand _why_ it works).

> However, this can all be very expensive, because you are matching paths
> through the function definition which you don't really care about.  All
> you care about here is the name.  So another approach is

Yeah, it is. Using the pre-1.0.7 version, the original patch runs in
~1.3 minutes on my machine. With "<... P ...>" it's almost 4 minutes.
Your python suggestion runs in about 1.5 minutes.

Curiously, 1.0.4 runs the original patch in only 24 seconds, and the
angle-bracket one takes 52 seconds. I'm not sure if something changed in
coccinelle, or if my build is simply less optimized (my 1.0.4 is from
the Debian package, and I'm building 1.0.7 from source; I had trouble
building 1.0.4 from source).

> @@
> position p : script:python() { p[0].current_element != "oldcmp" };
> expression E1,E2;
> @@
> 
> - hashcmp(E1->hash, E2->hash)
> + oidcmp(E1, E2)

Aha, this is exactly the magic I was hoping for. I agree this is the
best way to express it. I just had to tweak the patch to include the
position:

  - hashcmp@p(E1->hash, E2->hash)

and it worked great. Unfortunately, Debian's spatch is not built with
python support. :(

I'm not sure if we (the Git project) want to make the jump to requiring
a more specific spatch. OTOH, only a handful of developers actually run
it, and the python support does seem quite useful. And 1.0.4 is rather
old at this point.

Again, thanks very much for your response. I have a much better
understanding of what's going on now, and what our options are for
moving forward.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Cocci] excluding a function from coccinelle transformation
  2018-08-24 20:53   ` Jeff King
@ 2018-08-24 21:00     ` Julia Lawall
  0 siblings, 0 replies; 4+ messages in thread
From: Julia Lawall @ 2018-08-24 21:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git, cocci



On Fri, 24 Aug 2018, Jeff King wrote:

> On Fri, Aug 24, 2018 at 07:04:27AM -0400, Julia Lawall wrote:
>
> > On Fri, 24 Aug 2018, Jeff King wrote:
> >
> > > In Git's Coccinelle patches, we sometimes want to suppress a
> > > transformation inside a particular function. For example, in finding
> > > conversions of hashcmp() to oidcmp(), we should not convert the call in
> > > oidcmp() itself, since that would cause infinite recursion. We write the
> > > semantic patch like this:
> > >
> > >   @@
> > >   identifier f != oidcmp;
> > >   expression E1, E2;
> > >   @@
> > >     f(...) {...
> > >   - hashcmp(E1->hash, E2->hash)
> > >   + oidcmp(E1, E2)
> > >     ...}
> >
> > The problem is with how how ... works.  For transformation, A ... B
> > requires that B occur on every execution path starting with A, unless that
> > execution path ends up in error handling code.
> > (eg, if (...) { ... return; }).  Here your A is the start if the function.
> > So you need a call to hashcmp on every path through the function, which
> > fails when you add ifs.
>
> Thank you! This explanation (and the one below about A and B not
> appearing in the matched region) helped my understanding tremendously.
>
> > What you want is what you ended up using, which is <... P ...> which
> > allows zero or more occurrences of P.
>
> And now this makes much more sense (I stumbled onto it through brute
> force, but now I understand _why_ it works).
>
> > However, this can all be very expensive, because you are matching paths
> > through the function definition which you don't really care about.  All
> > you care about here is the name.  So another approach is
>
> Yeah, it is. Using the pre-1.0.7 version, the original patch runs in
> ~1.3 minutes on my machine. With "<... P ...>" it's almost 4 minutes.
> Your python suggestion runs in about 1.5 minutes.
>
> Curiously, 1.0.4 runs the original patch in only 24 seconds, and the
> angle-bracket one takes 52 seconds. I'm not sure if something changed in
> coccinelle, or if my build is simply less optimized (my 1.0.4 is from
> the Debian package, and I'm building 1.0.7 from source; I had trouble
> building 1.0.4 from source).

I don't remember the exact status of 1.0.4.  It is possible that an
optimization was found to pose problems and was removed in the meantime.

<... ...> can be useful when you expect it to eg match an if branch.  For
a function with over 1000 lines and many conditionals, it might not be a
good idea.  Actually, the main problem is with loops.  If there is a loop
in the function the performance can be much slower.

julia

>
> > @@
> > position p : script:python() { p[0].current_element != "oldcmp" };
> > expression E1,E2;
> > @@
> >
> > - hashcmp(E1->hash, E2->hash)
> > + oidcmp(E1, E2)
>
> Aha, this is exactly the magic I was hoping for. I agree this is the
> best way to express it. I just had to tweak the patch to include the
> position:
>
>   - hashcmp@p(E1->hash, E2->hash)
>
> and it worked great. Unfortunately, Debian's spatch is not built with
> python support. :(
>
> I'm not sure if we (the Git project) want to make the jump to requiring
> a more specific spatch. OTOH, only a handful of developers actually run
> it, and the python support does seem quite useful. And 1.0.4 is rather
> old at this point.
>
> Again, thanks very much for your response. I have a much better
> understanding of what's going on now, and what our options are for
> moving forward.
>
> -Peff
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-08-24 21:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-24  6:42 excluding a function from coccinelle transformation Jeff King
2018-08-24 11:04 ` [Cocci] " Julia Lawall
2018-08-24 20:53   ` Jeff King
2018-08-24 21:00     ` Julia Lawall

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).