git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Matheus Tavares <matheus.bernardino@usp.br>, johannes.schindelin@gmx.de
Cc: christian.couder@gmail.com, frekui@gmail.com,
	git@vger.kernel.org, j6t@kdbg.org, jonathantanmy@google.com,
	peff@peff.net, sandals@crustytoothpaste.net
Subject: Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
Date: Sun, 26 Jul 2020 19:41:08 +0200	[thread overview]
Message-ID: <a35a9334-6db2-d8e3-8ce5-15f15a9005e1@web.de> (raw)
In-Reply-To: <20200718035201.42233-1-matheus.bernardino@usp.br>

Am 18.07.20 um 05:52 schrieb Matheus Tavares:
> On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>>
>> Now, I am _far_ from knowing what I'm doing with Coccinelle, but I think
>> this here semantic patch should get you going:
>>
>> -- snipsnap --
>> @@
>> expression E;
>> @@
>>   {
>> ++   char hex[GIT_MAX_HEXSZ + 1];
>>      ...
>> -    oid_to_hex(E)
>> +    oid_to_hex_r(hex, E)
>>      ...
>>   }
>>
>> @@
>> expression E1, E2;
>> @@
>>   {
>> ++   char hex1[GIT_MAX_HEXSZ + 1], hex2[GIT_MAX_HEXSZ + 1];
>>      ...
>> -    oid_to_hex(E1)
>> +    oid_to_hex_r(hex1, E1)
>>      ...
>> -    oid_to_hex(E2)
>> +    oid_to_hex_r(hex2, E2)
>>      ...
>>   }
>
> Thanks for this nice example! This already worked very well in some of
> my tests :)
>
> However, with my _very_ limited notion of Coccinelle, I didn't
> understand why some code snippets didn't match the above rules. For
> example, the structure below:
>
> func(...)
> {
> 	if (cond)
> 		func2("%s", oid_to_hex(a));
> }
>
> I thought it could be because the `if` statement is missing the curly
> brackets (and it does work if I add the brackets), but to my surprise,
> adding another oid_to_hex() call in an `else` case also made the code
> match the rule:
>
> func(...)
> {
> 	if (cond)
> 		func2("%s", oid_to_hex(a));
> 	else
> 		func2("%s", oid_to_hex(a));
> }
>
> The following snippet also correctly matches, but spatch introduces only
> one `hex` variable:
>
> 	if (cond)
> 		func2("%s, %s", oid_to_hex(a), oid_to_hex(b));
> 	else
> 		func2("%s", oid_to_hex(a));
>

The produced diff looks like this:

+	char hex[GIT_MAX_HEXSZ + 1];
 	if (cond)
-		func2("%s, %s", oid_to_hex(a), oid_to_hex(b));
+		func2("%s, %s", oid_to_hex_r(hex, a), oid_to_hex_r(hex, b));
 	else
-		func2("%s", oid_to_hex(a));
+		func2("%s", oid_to_hex_r(hex, a));

So the first rule was applied (the one for a single oid_to_hex call),
but we need the second one (for two oid_to_hex calls).  Using the same
hex buffer for two different values won't work.

> We will probably want our semantic rules to handle an arbitrary number
> of `oid_to_hex()` calls in each function, but in scenarios like the
> above one, we only really need 2 hex buffers despite having 3 calls...

oid_to_hex() has two interesting features, and we need to make sure they
are preserved for callers that use them: It has a ring of four buffers,
so you can use it for four different values, and those buffers are
static, so its results can be passed around arbitrarily.

> That might be a little tricky, I guess.

It may be impossible to cover all cases.  E.g. callers of oid_to_hex()
could return that buffer (like e.g. diff_aligned_abbrev()) or save
them in some global variable (like e.g. string_list_append() with a
non-DUP string_list).  But safe conversion rules can got us surprisingly
far.

> Another thing that might be tricky in this conversion is checking for
> name conflicts with the added `hex` variable (but maybe Coccinelle
> already has a facilitator mechanism for such cases? IDK).

That's easy.  There exists no hex_buffer, yet.  We can just claim that
name for automatic conversions.  It's a bit too long for people to
type out (we have a few variables named hexbuffer, though), but not
crazy long.

So what is safe?  Calls of oid_to_hex() in the argument list of many
functions is.  E.g. puts() and printf() just consume the string that
is passed to them, and they don't store it anywhere.  That means no
static storage is needed for those.

string_list_append() is only safe if it's the duplicating variant.
Since this is a runtime option of the underlying string_list this is
hard to prove in a Coccinelle rule.  The time is better spent
converting them manually, I think.

And function calls with more than four oid_to_hex() calls are broken
already, so we only need to have rules for up to four of them.  Here
is the simplest rule I can come up with for handling up to four
oid_to_hex() calls:

@@
identifier fn =~ "^(.*printf.*|puts)$";
@@
(
  fn(...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer2,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer3,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer4,
      ...
    ), ...
  )
|
  fn(...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer2,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer3,
      ...
    ), ...
  )
|
  fn(...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer,
      ...
    ), ...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer2,
      ...
    ), ...
  )
|
  fn(...,
-   oid_to_hex
+   oid_to_hex_r
    (
+     hex_buffer,
      ...
    ), ...
  )
)

So the sub-rules are ordered from matching all four possible
oid_to_hex() calls down to a single one.  Only safe consumers are
matched.  That regex can and should be extended.

Having a list of good consumers has the nice side-effect of speeding up
the diff generation.  It still takes a few minutes on my system, though.

We still need to declare of the local buffers.  We can add them on
demand to each function with rules like this:

@avoid_duplicates@
identifier buf =~ "^hex_buffer[2-4]?$";
@@
- char buf[GIT_MAX_HEXSZ + 1];

@hex_buffer4_on_demand exists@
identifier fn;
@@
  fn(...) {
+   char hex_buffer4[GIT_MAX_HEXSZ + 1];
    <+... hex_buffer4 ...+>
  }

@hex_buffer3_on_demand exists@
identifier fn;
@@
  fn(...) {
+   char hex_buffer3[GIT_MAX_HEXSZ + 1];
    <+... hex_buffer3 ...+>
  }

@hex_buffer2_on_demand exists@
identifier fn;
@@
  fn(...) {
+   char hex_buffer2[GIT_MAX_HEXSZ + 1];
    <+... hex_buffer2 ...+>
  }

@hex_buffer_on_demand exists@
identifier fn;
@@
  fn(...) {
+   char hex_buffer[GIT_MAX_HEXSZ + 1];
    <+... hex_buffer ...+>
  }

Why remove them first?  To avoid duplicates when other convertible
oid_to_hex() calls are added later and the semantic patch applied
again.

Why declare the buffers with function scope?  To avoid shadowing.

Why are the rules reversed?  They add declarations at the top, so
hex_buffer to hex_buffer4 end up being declared in that order in
the resulting file.

Why not use the "fresh identifier" feature of Coccinelle to generate
an unused name each time?  I don't know how to integrate that into
the 4/3/2/1 rule above without having to repeat the list of safe
consumers or declaring unused buffers.  And reusing buffers between
safe consumers is fine.

René

      reply	other threads:[~2020-07-26 17:41 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24 22:29 [RFC] Thread safety in some low-level functions Matheus Tavares Bernardino
2020-06-24 22:52 ` Matheus Tavares Bernardino
2020-06-25  1:38   ` brian m. carlson
2020-06-25 20:32     ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares
2020-06-25 20:32       ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
2020-06-26  5:45         ` Christian Couder
2020-06-26 14:13           ` Matheus Tavares Bernardino
2020-06-25 20:32       ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
2020-06-25 23:07         ` brian m. carlson
2020-06-25 23:54           ` Matheus Tavares Bernardino
2020-06-26  0:00             ` Matheus Tavares Bernardino
2020-06-26  6:02         ` Christian Couder
2020-06-26  8:22       ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder
2020-06-26 16:22         ` Matheus Tavares Bernardino
2020-06-26 21:54       ` [PATCH v2 " Matheus Tavares
2020-06-26 21:54         ` [PATCH v2 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
2020-06-26 21:54         ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
2020-06-29 15:11           ` Johannes Schindelin
2020-06-30 20:37             ` Matheus Tavares Bernardino
2020-07-01 11:32               ` Johannes Schindelin
2020-07-16 11:29           ` Johannes Schindelin
2020-07-18  3:09             ` Matheus Tavares Bernardino
2020-08-10 14:15               ` Johannes Schindelin
2020-07-18  3:52             ` Matheus Tavares
2020-07-26 17:41               ` René Scharfe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a35a9334-6db2-d8e3-8ce5-15f15a9005e1@web.de \
    --to=l.s.r@web.de \
    --cc=christian.couder@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=j6t@kdbg.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=jonathantanmy@google.com \
    --cc=matheus.bernardino@usp.br \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).