Coverity false positives triggered by gnulib's implementation of base64

bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed

* Coverity false positives triggered by gnulib's implementation of base64
@ 2019-05-07 14:22 Kamil Dudka
  2019-05-08  8:15 ` Bruno Haible
  2019-05-09 19:14 ` Paul Eggert
  0 siblings, 2 replies; 16+ messages in thread
From: Kamil Dudka @ 2019-05-07 14:22 UTC (permalink / raw)
  To: bug-gnulib

Coverity Analysis 2019.03 incorrectly marks the input argument 
of base64_encode(), and conseuqnetly base64_encode_alloc(), as
tainted_data_sink because it sees byte-level operations on the input.

It triggered the following false positives in the cryptsetup project:

Error: TAINTED_SCALAR:
lib/luks2/luks2_digest_pbkdf2.c:117: tainted_data_argument: Calling function "crypt_random_get" taints argument "salt".
lib/luks2/luks2_digest_pbkdf2.c:157: tainted_data: Passing tainted variable "salt" to a tainted sink.

Error: TAINTED_SCALAR:
lib/luks2/luks2_keyslot_luks2.c:445: tainted_data_argument: Calling function "crypt_random_get" taints argument "salt".
lib/luks2/luks2_keyslot_luks2.c:448: tainted_data: Passing tainted variable "salt" to a tainted sink.

... but it can affect other gnulib-based projects, too.  Would it be
possible to apply the following one-line patch on gnulib source code
to suppress this class of false positives in gnulib-based projects?

https://gitlab.com/cryptsetup/cryptsetup/commit/75b2610e

Thanks in advance for considering it!

Kamil

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-07 14:22 Coverity false positives triggered by gnulib's implementation of base64 Kamil Dudka
@ 2019-05-08  8:15 ` Bruno Haible
  2019-05-09 16:14   ` Kamil Dudka
  2019-05-09 19:14 ` Paul Eggert
  1 sibling, 1 reply; 16+ messages in thread
From: Bruno Haible @ 2019-05-08  8:15 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Kamil Dudka

Hi Kamil,

> Coverity Analysis 2019.03 incorrectly marks the input argument 
> of base64_encode(), and conseuqnetly base64_encode_alloc(), as
> tainted_data_sink because it sees byte-level operations on the input.
> 
> It triggered the following false positives in the cryptsetup project:
> 
> Error: TAINTED_SCALAR:
> lib/luks2/luks2_digest_pbkdf2.c:117: tainted_data_argument: Calling function "crypt_random_get" taints argument "salt".
> lib/luks2/luks2_digest_pbkdf2.c:157: tainted_data: Passing tainted variable "salt" to a tainted sink.
> 
> Error: TAINTED_SCALAR:
> lib/luks2/luks2_keyslot_luks2.c:445: tainted_data_argument: Calling function "crypt_random_get" taints argument "salt".
> lib/luks2/luks2_keyslot_luks2.c:448: tainted_data: Passing tainted variable "salt" to a tainted sink.
> 
> 
> ... but it can affect other gnulib-based projects, too.  Would it be
> possible to apply the following one-line patch on gnulib source code
> to suppress this class of false positives in gnulib-based projects?
> 
> https://gitlab.com/cryptsetup/cryptsetup/commit/75b2610e

When I read the description of "tainted data" [1] and of the
recommendations how to deal with such warnings [2], it is clear that
the warning/error is about the global data flow. Therefore it seems
inappropriate to me to put annotations about the global data flow
into gnulib (which is shared among multiple packages).

Therefore, what I would suggest is that you create an inline function
that merely invokes base64_encode_alloc, use it in line 157 of [3],
and put your annotation on it.

Does it need to be done in the source code at all? Some Coverity tools
have a UI that allows the developers to mark the findings as "false
positive" or as "handled" without touching the source code, and these
marks persist across modifications of the source code. Don't you have
this possibility for this Coverity tool?

Bruno

[1] https://community.synopsys.com/s/article/Tainted-data-in-Coverity
[2] https://stackoverflow.com/questions/24772247/how-to-handle-coverity-error-tainted-scalar-in-fread
[3] https://fossies.org/linux/cryptsetup/lib/luks2/luks2_digest_pbkdf2.c

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-08  8:15 ` Bruno Haible
@ 2019-05-09 16:14   ` Kamil Dudka
  2019-05-09 20:35     ` Bruno Haible
  0 siblings, 1 reply; 16+ messages in thread
From: Kamil Dudka @ 2019-05-09 16:14 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib

Hi Bruno,

On Wednesday, May 8, 2019 10:15:37 AM CEST Bruno Haible wrote:
> Hi Kamil,
> 
> > Coverity Analysis 2019.03 incorrectly marks the input argument
> > of base64_encode(), and conseuqnetly base64_encode_alloc(), as
> > tainted_data_sink because it sees byte-level operations on the input.
> > 
> > It triggered the following false positives in the cryptsetup project:
> > 
> > Error: TAINTED_SCALAR:
> > lib/luks2/luks2_digest_pbkdf2.c:117: tainted_data_argument: Calling
> > function "crypt_random_get" taints argument "salt".
> > lib/luks2/luks2_digest_pbkdf2.c:157: tainted_data: Passing tainted
> > variable "salt" to a tainted sink.
> > 
> > Error: TAINTED_SCALAR:
> > lib/luks2/luks2_keyslot_luks2.c:445: tainted_data_argument: Calling
> > function "crypt_random_get" taints argument "salt".
> > lib/luks2/luks2_keyslot_luks2.c:448: tainted_data: Passing tainted
> > variable "salt" to a tainted sink.
> > 
> > 
> > ... but it can affect other gnulib-based projects, too.  Would it be
> > possible to apply the following one-line patch on gnulib source code
> > to suppress this class of false positives in gnulib-based projects?
> > 
> > https://gitlab.com/cryptsetup/cryptsetup/commit/75b2610e
> 
> When I read the description of "tainted data" [1] and of the
> recommendations how to deal with such warnings [2], it is clear that
> the warning/error is about the global data flow. Therefore it seems
> inappropriate to me to put annotations about the global data flow
> into gnulib (which is shared among multiple packages).

Yes, the false positives will not appear while analyzing gnulib's code
in isolation, if this is what you mean.  On the other hand, gnulib is
intended to be bundled with other projects, where these false positives
will likely trigger.

You can find a brief definition of taint analysis for example here:

https://www.owasp.org/index.php/Static_Code_Analysis#Taint_Analysis

There are 3 important state-transitions in the data-flow analysis:

(1) obtaining data from untrusted source
(2) sanitizing the data (checking bounds etc.)
(3) unsafe use of untrusted data

gnulib's base64_encode() as seen by Coverity Analysis represents (3)
because its implementation uses byte swaps.  This is a heuristic that
is not always correct, so false positives may happen.  If you ask why
byte swaps are checked, I believe it was inspired by Heartbleed:

https://www.synopsys.com/blogs/software-security/detecting-heartbleed-with-static-analysis/

The inline annotation that I proposed as a patch gives Coverity a hint
that gnulib's implementation of base64_encode() can safely process data
from untrusted sources.  The annotation is specific to the implementation
of the function, not to users of the function.  Thanks to the annotation,
the false positives will be suppressed in all projects that bundle gnulib.

> Therefore, what I would suggest is that you create an inline function
> that merely invokes base64_encode_alloc, use it in line 157 of [3],
> and put your annotation on it.

There is no need for such obfuscation in cryptsetup since they have the
inline annotation already added in their own copy of the base64 module.
I was just trying to improve the developer experience of gnulib itself.

> Does it need to be done in the source code at all?

Yes, in case of gnulib this is the only sensible option.

> Some Coverity tools
> have a UI that allows the developers to mark the findings as "false
> positive" or as "handled" without touching the source code, and these
> marks persist across modifications of the source code. Don't you have
> this possibility for this Coverity tool?

Yes, various tools exist to waive false positives.  The problem is that
instances of these tools do not share data with each other in the universe.
Consequently, developers have to repeatedly review these false positives
and waive them in each single instance of these tools.  And even worse with
gnulib because these false positives are usually not matched across different
project that bundle gnulib, even if you have a single instance of the waiving
tool in your organisation.

Kamil

> Bruno
> 
> [1] https://community.synopsys.com/s/article/Tainted-data-in-Coverity
> [2]
> https://stackoverflow.com/questions/24772247/how-to-handle-coverity-error-t
> ainted-scalar-in-fread [3]
> https://fossies.org/linux/cryptsetup/lib/luks2/luks2_digest_pbkdf2.c

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-07 14:22 Coverity false positives triggered by gnulib's implementation of base64 Kamil Dudka
  2019-05-08  8:15 ` Bruno Haible
@ 2019-05-09 19:14 ` Paul Eggert
  2019-05-10 11:32   ` Kamil Dudka
  1 sibling, 1 reply; 16+ messages in thread
From: Paul Eggert @ 2019-05-09 19:14 UTC (permalink / raw)
  To: Kamil Dudka; +Cc: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 2585 bytes --]

On 5/7/19 7:22 AM, Kamil Dudka wrote:
> It triggered the following false positives in the cryptsetup project:
>
> Error: TAINTED_SCALAR:
> lib/luks2/luks2_digest_pbkdf2.c:117: tainted_data_argument: Calling function "crypt_random_get" taints argument "salt".
> lib/luks2/luks2_digest_pbkdf2.c:157: tainted_data: Passing tainted variable "salt" to a tainted sink.
>
> Error: TAINTED_SCALAR:
> lib/luks2/luks2_keyslot_luks2.c:445: tainted_data_argument: Calling function "crypt_random_get" taints argument "salt".
> lib/luks2/luks2_keyslot_luks2.c:448: tainted_data: Passing tainted variable "salt" to a tainted sink.
>
>
> ... but it can affect other gnulib-based projects, too.

Can you explain in more detail what these diagnostic messages mean, and
why this is Gnulib's "fault"? For example, how do you know that the
reports are false positives and not true positives? That sort of thing.
Normally I'm a fan of static analysis, but it can go too far sometimes....

> gnulib's base64_encode() as seen by Coverity Analysis represents (3)
> because its implementation uses byte swaps.
base64_encode does not actually swap bytes. For example, it wouldn't be
reasonable for base64.c to include byteswap.h in order to clarify
base64_encode, because the byteswap.h's byteswapping operations wouldn't
help. Also, even if base64_encode were swapping bytes, byte-swapping by
itself is not a problem and I don't see why Heartbleed is relevant here,
as the resulting subscripting operations are obviously in range. So I
would like to to know exactly why Coverity infers that this particular
byte-manipulation is problematic.

For example, does the problem go away if you replace 'b64c[to_uchar
(in[0]) >> 2]' with 'b64c[(to_uchar (in[0]) >> 2) & 0x3f]'? If so, it
would indicate that Coverity merely needs to improve their range analysis.

More generally, does it help if you make simple changes to base64.c that
might clarify the code a bit and in addition pacify Coverity? If so,
that might be preferable. Something like the attached untested patch,
say. The idea is to somehow convince Coverity that the code is OK, while
not making it worse (and maybe making it better).


> /* coverity[-tainted_data_sink: arg-0] */
Just adding a line like that would be too cryptic. We can't expect the
reader to know the ins and outs of a proprietary 3rd-party tool.
Instead, the comment would have to explain what's going on and why the
comment is needed, well enough for the Gnulib audience.

However, I'm hoping that we don't need the line at all, because it
sounds like Coverity might be buggy here.


[-- Attachment #2: base64.patch --]
[-- Type: text/x-patch, Size: 2299 bytes --]

diff --git a/lib/base64.c b/lib/base64.c
index f3f7298aa..786b0f418 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -64,11 +64,11 @@ static const char b64c[64] =
 
 /* Base64 encode IN array of size INLEN into OUT array. OUT needs
    to be of length >= BASE64_LENGTH(INLEN), and INLEN needs to be
-   a multiple of 3.  */
+   a positive multiple of 3.  */
 static void
 base64_encode_fast (const char *restrict in, size_t inlen, char *restrict out)
 {
-  while (inlen)
+  do
     {
       *out++ = b64c[to_uchar (in[0]) >> 2];
       *out++ = b64c[((to_uchar (in[0]) << 4) + (to_uchar (in[1]) >> 4)) & 0x3f];
@@ -78,6 +78,7 @@ base64_encode_fast (const char *restrict in, size_t inlen, char *restrict out)
       inlen -= 3;
       in += 3;
     }
+  while (inlen);
 }
 
 /* Base64 encode IN array of size INLEN into OUT array of size OUTLEN.
@@ -88,6 +89,9 @@ void
 base64_encode (const char *restrict in, size_t inlen,
                char *restrict out, size_t outlen)
 {
+  if (!outlen)
+    return;
+
   /* Note this outlen constraint can be enforced at compile time.
      I.E. that the output buffer is exactly large enough to hold
      the encoded inlen bytes.  The inlen constraints (of corresponding
@@ -101,16 +105,17 @@ base64_encode (const char *restrict in, size_t inlen,
       return;
     }
 
-  while (inlen && outlen)
+  if (inlen)
+   while (true)
     {
       *out++ = b64c[to_uchar (in[0]) >> 2];
       if (!--outlen)
-        break;
+        return;
       *out++ = b64c[((to_uchar (in[0]) << 4)
                        + (--inlen ? to_uchar (in[1]) >> 4 : 0))
                       & 0x3f];
       if (!--outlen)
-        break;
+        return;
       *out++ =
         (inlen
          ? b64c[((to_uchar (in[1]) << 2)
@@ -118,18 +123,17 @@ base64_encode (const char *restrict in, size_t inlen,
                   & 0x3f]
          : '=');
       if (!--outlen)
-        break;
+        return;
       *out++ = inlen ? b64c[to_uchar (in[2]) & 0x3f] : '=';
       if (!--outlen)
+        return;
+      if (inlen <= 1)
         break;
-      if (inlen)
-        inlen--;
-      if (inlen)
-        in += 3;
+      inlen--;
+      in += 3;
     }
 
-  if (outlen)
-    *out = '\0';
+  *out = '\0';
 }
 
 /* Allocate a buffer and store zero terminated base64 encoded data

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-09 16:14   ` Kamil Dudka
@ 2019-05-09 20:35     ` Bruno Haible
  2019-05-09 21:55       ` Paul Eggert
  2019-05-10 11:41       ` Kamil Dudka
  0 siblings, 2 replies; 16+ messages in thread
From: Bruno Haible @ 2019-05-09 20:35 UTC (permalink / raw)
  To: Kamil Dudka; +Cc: bug-gnulib

Hi Kamil,

> There are 3 important state-transitions in the data-flow analysis:
> 
> (1) obtaining data from untrusted source
> (2) sanitizing the data (checking bounds etc.)
> (3) unsafe use of untrusted data
> 
> gnulib's base64_encode() as seen by Coverity Analysis represents (3)
> because its implementation uses byte swaps.  This is a heuristic that
> is not always correct, so false positives may happen.  If you ask why
> byte swaps are checked, I believe it was inspired by Heartbleed:
> 
> https://www.synopsys.com/blogs/software-security/detecting-heartbleed-with-static-analysis/
> 
> The inline annotation that I proposed as a patch gives Coverity a hint
> that gnulib's implementation of base64_encode() can safely process data
> from untrusted sources.  The annotation is specific to the implementation
> of the function, not to users of the function.

Ah, thanks for explaining. Now I agree: base64_encode produces the
warning because of the (x << n) | (y >> m) expression patterns that
resemble a byte swap. It would do so also for any other program that
contains a base64_encode invocation with untrusted input as argument.

> > Does it need to be done in the source code at all?
> 
> Yes, in case of gnulib this is the only sensible option.
> ...
> Yes, various tools exist to waive false positives.  The problem is that
> instances of these tools do not share data with each other in the universe.
> Consequently, developers have to repeatedly review these false positives
> and waive them in each single instance of these tools.  And even worse with
> gnulib because these false positives are usually not matched across different
> project that bundle gnulib, even if you have a single instance of the waiving
> tool in your organisation.

So, I propose to bite the bullet, but at least put a reasonable comment.


2019-05-09  Kamil Dudka  <kdudka@redhat.com>
            Bruno Haible  <bruno@clisp.org>

	base64: Avoid false positive warning from Coverity.
	* lib/base64.c (base64_encode): Add special comment for Coverity.

diff --git a/lib/base64.c b/lib/base64.c
index f3f7298..80428bb 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -84,6 +84,11 @@ base64_encode_fast (const char *restrict in, size_t inlen, char *restrict out)
    If OUTLEN is less than BASE64_LENGTH(INLEN), write as many bytes as
    possible.  If OUTLEN is larger than BASE64_LENGTH(INLEN), also zero
    terminate the output buffer. */
+/* Tell Coverity that this function works fine when called with IN
+   pointing to untrusted input.  By default, Coverity, seeing the value
+   shift expressions below, thinks that it is dangerous to call this
+   function with untrusted input.
+   coverity[-tainted_data_sink: arg-0]  */
 void
 base64_encode (const char *restrict in, size_t inlen,
                char *restrict out, size_t outlen)



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-09 20:35     ` Bruno Haible
@ 2019-05-09 21:55       ` Paul Eggert
  2019-05-09 22:13         ` Bruno Haible
  2019-05-10 11:41       ` Kamil Dudka
  1 sibling, 1 reply; 16+ messages in thread
From: Paul Eggert @ 2019-05-09 21:55 UTC (permalink / raw)
  To: Bruno Haible, Kamil Dudka; +Cc: bug-gnulib

On 5/9/19 1:35 PM, Bruno Haible wrote:
>> https://www.synopsys.com/blogs/software-security/detecting-heartbleed-with-static-analysis/
>>
> base64_encode produces the
> warning because of the (x << n) | (y >> m) expression patterns that
> resemble a byte swap. It would do so also for any other program that
> contains a base64_encode invocation with untrusted input as argument.
>
Sorry, I'm still not following. Unless the tainted data is used to
calculate an array index, there's no problem with Heartbleed and the
Coverity heuristic should not diagnose a problem. Within base64_encode
itself, there's no problem with the calculated array indices because
they're obviously in range. Conversely, if the caller is using the
output of base64_encode to compute an array index, that is indeed a
potential problem that may indicate a Heartbleed-related bug. But in
that case, the proposed comment would be wrong as it would pacify
Coverity without fixing the real bug elsewhere.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-09 21:55       ` Paul Eggert
@ 2019-05-09 22:13         ` Bruno Haible
  2019-05-09 22:28           ` Paul Eggert
  2019-05-10 11:57           ` Kamil Dudka
  0 siblings, 2 replies; 16+ messages in thread
From: Bruno Haible @ 2019-05-09 22:13 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Kamil Dudka, bug-gnulib

Hi Paul,

> Sorry, I'm still not following. Unless the tainted data is used to
> calculate an array index, there's no problem with Heartbleed and the
> Coverity heuristic should not diagnose a problem.

Yes, IF they were only using an algorithm and no heuristic,
base64_encode would not be flagged as a dangerous consumer of
untrusted input.

But their article
https://www.synopsys.com/blogs/software-security/detecting-heartbleed-with-static-analysis/
says:

  "Program analysis is hard and approximations and trade-offs are
   absolutely mandatory. We’ve found that the best results come from
   a combination of advanced algorithms and knowledge of idioms that
   occur in real-world code."

So they are combining data flow analysis - in order to determine
that the argument of base64_alloc is untrusted data - with a
heuristic - "if a function contains array accesses with indices that
are computed with ntohs calls, we should flag it as dangerous consumer".

> the proposed comment would be wrong as it would pacify
> Coverity without fixing the real bug elsewhere.

The base64_encode function does not make a dangerous array
access (only to the 'b64c' array). And its result is a string,
not an integer, and therefore cannot be used as an array index
either. Therefore adding this comment cannot silence real bugs.

But maybe it will be sufficient to mask all b64c arguments
with '& 0x3f', like you already suggested in the other mail?

Bruno

diff --git a/lib/base64.c b/lib/base64.c
index f3f7298..a00e0f4 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -70,7 +70,7 @@ base64_encode_fast (const char *restrict in, size_t inlen, char *restrict out)
 {
   while (inlen)
     {
-      *out++ = b64c[to_uchar (in[0]) >> 2];
+      *out++ = b64c[(to_uchar (in[0]) >> 2) & 0x3f];
       *out++ = b64c[((to_uchar (in[0]) << 4) + (to_uchar (in[1]) >> 4)) & 0x3f];
       *out++ = b64c[((to_uchar (in[1]) << 2) + (to_uchar (in[2]) >> 6)) & 0x3f];
       *out++ = b64c[to_uchar (in[2]) & 0x3f];
@@ -103,7 +103,7 @@ base64_encode (const char *restrict in, size_t inlen,

   while (inlen && outlen)
     {
-      *out++ = b64c[to_uchar (in[0]) >> 2];
+      *out++ = b64c[(to_uchar (in[0]) >> 2) & 0x3f];
       if (!--outlen)
         break;
       *out++ = b64c[((to_uchar (in[0]) << 4)

Bruno

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-09 22:13         ` Bruno Haible
@ 2019-05-09 22:28           ` Paul Eggert
  2019-05-10 11:57           ` Kamil Dudka
  1 sibling, 0 replies; 16+ messages in thread
From: Paul Eggert @ 2019-05-09 22:28 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Kamil Dudka, bug-gnulib

On 5/9/19 3:13 PM, Bruno Haible wrote:
> So they are combining data flow analysis - in order to determine
> that the argument of base64_alloc is untrusted data - with a
> heuristic - "if a function contains array accesses with indices that
> are computed with ntohs calls, we should flag it as dangerous consumer".

If that's their heuristic then it's obviously bogus and should get fixed.

I understood it to be more complicated than that. For example, perhaps
they see that base64_encode copies its input data to its output buffer
via a swap-like function so that if the input is tainted then the output
is too; this is not an unreasonable heuristic and if they recently added
it they'll be finding more Heartbleed-related bugs in calling code than
they did before. However, if that's the case, the problematic area is in
the calling code, and adding the comment could mask real bugs.

> But maybe it will be sufficient to mask all b64c arguments
> with '& 0x3f', like you already suggested in the other mail?
I hope so. Partly because I hope GCC will optimize away the &0x3f so
there's no runtime cost. And partly because if it does pacify Coverity
we can be pretty sure that it's just a Coverity false alarm and Coverity
should get fixed (which means less work for us :-).

If the comment is really needed then I'd like to know more about the
problem, because it currently remains a mystery (at least to me).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-09 19:14 ` Paul Eggert
@ 2019-05-10 11:32   ` Kamil Dudka
  2019-05-10 11:34     ` Florian Weimer
  2019-05-10 23:36     ` Paul Eggert
  0 siblings, 2 replies; 16+ messages in thread
From: Kamil Dudka @ 2019-05-10 11:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: bug-gnulib

On Thursday, May 9, 2019 9:14:40 PM CEST Paul Eggert wrote:
> On 5/7/19 7:22 AM, Kamil Dudka wrote:
> > It triggered the following false positives in the cryptsetup project:
> > 
> > Error: TAINTED_SCALAR:
> > lib/luks2/luks2_digest_pbkdf2.c:117: tainted_data_argument: Calling
> > function "crypt_random_get" taints argument "salt".
> > lib/luks2/luks2_digest_pbkdf2.c:157: tainted_data: Passing tainted
> > variable "salt" to a tainted sink.
> > 
> > Error: TAINTED_SCALAR:
> > lib/luks2/luks2_keyslot_luks2.c:445: tainted_data_argument: Calling
> > function "crypt_random_get" taints argument "salt".
> > lib/luks2/luks2_keyslot_luks2.c:448: tainted_data: Passing tainted
> > variable "salt" to a tainted sink.
> > 
> > 
> > ... but it can affect other gnulib-based projects, too.
> 
> Can you explain in more detail what these diagnostic messages mean,

1. crypt_random_get() reads data from untrusted source (file descriptor).

2. The data is given to base64_encode() without being checked first.

> and why this is Gnulib's "fault"?

Nobody said it was Gnulib's "fault".  I am just proposing a one-line comment 
that would significantly help to many direct and indirect users of Gnulib.

> For example, how do you know that the reports are false positives and not
> true positives?

I think it was obvious from my previous explanation:

(1) You need to check (by manual review) that the source of data is really 
untrusted.

(2) You need to check (by manual review) that there is no sufficient check
on the data.

(3) You need to check (by manual review) that the sink function is really 
vulnerable to data from untrusted source.

When doing step (3), I verified that Gnulib's base64_encode() can safely 
process data from untrusted source.  Then I wanted to record this information 
into the source code so that other users of Gnulib do not need to verify this 
each time they run Coverity on a project that bundles Gnulib's implementation 
of base64_encode().

> That sort of thing.
> Normally I'm a fan of static analysis, but it can go too far sometimes....

This is a matter of taste.  I do not think that all users of Gnulib share
your opinion on this.

> > gnulib's base64_encode() as seen by Coverity Analysis represents (3)
> > because its implementation uses byte swaps.
> 
> base64_encode does not actually swap bytes. For example, it wouldn't be
> reasonable for base64.c to include byteswap.h in order to clarify
> base64_encode, because the byteswap.h's byteswapping operations wouldn't
> help. Also, even if base64_encode were swapping bytes, byte-swapping by
> itself is not a problem and I don't see why Heartbleed is relevant here,
> as the resulting subscripting operations are obviously in range. So I
> would like to to know exactly why Coverity infers that this particular
> byte-manipulation is problematic.
> 
> For example, does the problem go away if you replace 'b64c[to_uchar
> (in[0]) >> 2]' with 'b64c[(to_uchar (in[0]) >> 2) & 0x3f]'? If so, it
> would indicate that Coverity merely needs to improve their range analysis.

Who said that range analysis is actually used for the TAINTED_SCALAR checker 
implemented by Coverity?

While the technical details are not public, analyses like this can also work 
more or less on a syntax level, yet still provide useful enough results.

> More generally, does it help if you make simple changes to base64.c that
> might clarify the code a bit and in addition pacify Coverity? If so,
> that might be preferable. Something like the attached untested patch,
> say.

The code change you proposed does not make the false positives go away.

> The idea is to somehow convince Coverity that the code is OK, while
> not making it worse (and maybe making it better).

I do not think it is a good idea to change a piece of working code to make
a static analysis false positives magically disappear.  Even if you were able 
to eliminate the false positives now, they can reappear with a future version 
of Coverity.  The implementation of the sink detector in the TAINTED_SCALAR 
checker is a black box for us whereas the inline annotations are documented 
and usually backward compatible in future releases of Coverity.

> > /* coverity[-tainted_data_sink: arg-0] */
> 
> Just adding a line like that would be too cryptic. We can't expect the
> reader to know the ins and outs of a proprietary 3rd-party tool.

OK, got it.

> Instead, the comment would have to explain what's going on and why the
> comment is needed, well enough for the Gnulib audience.

Exactly what Bruno proposed and I like the idea.

> However, I'm hoping that we don't need the line at all, because it
> sounds like Coverity might be buggy here.

It is important to realize that static analysis algorithms like this are 
successful because they are computationally cheap and cope easily with 
unmodified real-world code.  They suffer from both false positives and
false negatives by design.  Getting precise results for checkers like
this is computationally expensive and in the general case impossible.

Kamil

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-10 11:32   ` Kamil Dudka
@ 2019-05-10 11:34     ` Florian Weimer
  2019-05-10 11:51       ` Kamil Dudka
  2019-05-10 23:36     ` Paul Eggert
  1 sibling, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2019-05-10 11:34 UTC (permalink / raw)
  To: Kamil Dudka; +Cc: Paul Eggert, bug-gnulib

* Kamil Dudka:

>> For example, how do you know that the reports are false positives and not
>> true positives?
>
> I think it was obvious from my previous explanation:
>
> (1) You need to check (by manual review) that the source of data is really 
> untrusted.
>
> (2) You need to check (by manual review) that there is no sufficient check
> on the data.
>
> (3) You need to check (by manual review) that the sink function is really 
> vulnerable to data from untrusted source.
>
> When doing step (3), I verified that Gnulib's base64_encode() can safely 
> process data from untrusted source.  Then I wanted to record this information 
> into the source code so that other users of Gnulib do not need to verify this 
> each time they run Coverity on a project that bundles Gnulib's implementation 
> of base64_encode().

Does the annotation make the base64 functions trusted in the sense that
they now turn untrusted data into trusted data?  That would be
undesirable in my opinion.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-09 20:35     ` Bruno Haible
  2019-05-09 21:55       ` Paul Eggert
@ 2019-05-10 11:41       ` Kamil Dudka
  1 sibling, 0 replies; 16+ messages in thread
From: Kamil Dudka @ 2019-05-10 11:41 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib

On Thursday, May 9, 2019 10:35:18 PM CEST Bruno Haible wrote:
> Hi Kamil,
> 
> > There are 3 important state-transitions in the data-flow analysis:
> > 
> > (1) obtaining data from untrusted source
> > (2) sanitizing the data (checking bounds etc.)
> > (3) unsafe use of untrusted data
> > 
> > gnulib's base64_encode() as seen by Coverity Analysis represents (3)
> > because its implementation uses byte swaps.  This is a heuristic that
> > is not always correct, so false positives may happen.  If you ask why
> > byte swaps are checked, I believe it was inspired by Heartbleed:
> > 
> > https://www.synopsys.com/blogs/software-security/detecting-heartbleed-with
> > -static-analysis/
> > 
> > The inline annotation that I proposed as a patch gives Coverity a hint
> > that gnulib's implementation of base64_encode() can safely process data
> > from untrusted sources.  The annotation is specific to the implementation
> > of the function, not to users of the function.
> 
> Ah, thanks for explaining. Now I agree: base64_encode produces the
> warning because of the (x << n) | (y >> m) expression patterns that
> resemble a byte swap. It would do so also for any other program that
> contains a base64_encode invocation with untrusted input as argument.
> 
> > > Does it need to be done in the source code at all?
> > 
> > Yes, in case of gnulib this is the only sensible option.
> > ...
> > Yes, various tools exist to waive false positives.  The problem is that
> > instances of these tools do not share data with each other in the
> > universe.
> > Consequently, developers have to repeatedly review these false positives
> > and waive them in each single instance of these tools.  And even worse
> > with
> > gnulib because these false positives are usually not matched across
> > different project that bundle gnulib, even if you have a single instance
> > of the waiving tool in your organisation.
> 
> So, I propose to bite the bullet, but at least put a reasonable comment.
> 
> 
> 2019-05-09  Kamil Dudka  <kdudka@redhat.com>
>             Bruno Haible  <bruno@clisp.org>
> 
> 	base64: Avoid false positive warning from Coverity.
> 	* lib/base64.c (base64_encode): Add special comment for Coverity.
> 
> diff --git a/lib/base64.c b/lib/base64.c
> index f3f7298..80428bb 100644
> --- a/lib/base64.c
> +++ b/lib/base64.c
> @@ -84,6 +84,11 @@ base64_encode_fast (const char *restrict in, size_t
> inlen, char *restrict out) If OUTLEN is less than BASE64_LENGTH(INLEN),
> write as many bytes as possible.  If OUTLEN is larger than
> BASE64_LENGTH(INLEN), also zero terminate the output buffer. */
> +/* Tell Coverity that this function works fine when called with IN
> +   pointing to untrusted input.  By default, Coverity, seeing the value
> +   shift expressions below, thinks that it is dangerous to call this
> +   function with untrusted input.
> +   coverity[-tainted_data_sink: arg-0]  */
>  void
>  base64_encode (const char *restrict in, size_t inlen,
>                 char *restrict out, size_t outlen)

Perfect.  I like the idea.  Unfortunately, Coverity seems to be picky about 
the format, so it needs to be spelled like this:

diff --git a/lib/base64.c b/lib/base64.c
index bb4dce8..5cbef9c 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -84,6 +84,11 @@ base64_encode_fast (const char *restrict in, size_t inlen, 
char *restrict out)
    If OUTLEN is less than BASE64_LENGTH(INLEN), write as many bytes as
    possible.  If OUTLEN is larger than BASE64_LENGTH(INLEN), also zero
    terminate the output buffer. */
+/* Tell Coverity that this function works fine when called with IN
+   pointing to untrusted input.  By default, Coverity, seeing the value
+   shift expressions below, thinks that it is dangerous to call this
+   function with untrusted input. */
+/* coverity[-tainted_data_sink: arg-0] */
 void
 base64_encode (const char *restrict in, size_t inlen,
                char *restrict out, size_t outlen)





^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-10 11:34     ` Florian Weimer
@ 2019-05-10 11:51       ` Kamil Dudka
  0 siblings, 0 replies; 16+ messages in thread
From: Kamil Dudka @ 2019-05-10 11:51 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Paul Eggert, bug-gnulib

On Friday, May 10, 2019 1:34:55 PM CEST Florian Weimer wrote:
> * Kamil Dudka:
> >> For example, how do you know that the reports are false positives and not
> >> true positives?
> > 
> > I think it was obvious from my previous explanation:
> > 
> > (1) You need to check (by manual review) that the source of data is really
> > untrusted.
> > 
> > (2) You need to check (by manual review) that there is no sufficient check
> > on the data.
> > 
> > (3) You need to check (by manual review) that the sink function is really
> > vulnerable to data from untrusted source.
> > 
> > When doing step (3), I verified that Gnulib's base64_encode() can safely
> > process data from untrusted source.  Then I wanted to record this
> > information into the source code so that other users of Gnulib do not
> > need to verify this each time they run Coverity on a project that bundles
> > Gnulib's implementation of base64_encode().
> 
> Does the annotation make the base64 functions trusted in the sense that
> they now turn untrusted data into trusted data?  That would be
> undesirable in my opinion.

Nope.  The following annotation:

    /* coverity[-tainted_data_sink: arg-0] */

... does not affect data sanitization at all, as I understand it.  It only 
tells Coverity that the `in` parameter of base64_encode() is not a taint sink.

On the other hand, I do not think that Coverity tracks propagation of tainted 
data across non-trivial operations on the data (implementation of the base64 
algorithm is IMO definitely out of scope).

Kamil

> Thanks,
> Florian




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-09 22:13         ` Bruno Haible
  2019-05-09 22:28           ` Paul Eggert
@ 2019-05-10 11:57           ` Kamil Dudka
  2019-05-10 14:11             ` Bruno Haible
  1 sibling, 1 reply; 16+ messages in thread
From: Kamil Dudka @ 2019-05-10 11:57 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Paul Eggert, bug-gnulib

On Friday, May 10, 2019 12:13:32 AM CEST Bruno Haible wrote:
> Hi Paul,
> 
> > Sorry, I'm still not following. Unless the tainted data is used to
> > calculate an array index, there's no problem with Heartbleed and the
> > Coverity heuristic should not diagnose a problem.
> 
> Yes, IF they were only using an algorithm and no heuristic,
> base64_encode would not be flagged as a dangerous consumer of
> untrusted input.
> 
> But their article
> https://www.synopsys.com/blogs/software-security/detecting-heartbleed-with-s
> tatic-analysis/ says:
> 
>   "Program analysis is hard and approximations and trade-offs are
>    absolutely mandatory. We’ve found that the best results come from
>    a combination of advanced algorithms and knowledge of idioms that
>    occur in real-world code."
> 
> So they are combining data flow analysis - in order to determine
> that the argument of base64_alloc is untrusted data - with a
> heuristic - "if a function contains array accesses with indices that
> are computed with ntohs calls, we should flag it as dangerous consumer".
> 
> > the proposed comment would be wrong as it would pacify
> > Coverity without fixing the real bug elsewhere.
> 
> The base64_encode function does not make a dangerous array
> access (only to the 'b64c' array). And its result is a string,
> not an integer, and therefore cannot be used as an array index
> either. Therefore adding this comment cannot silence real bugs.
> 
> But maybe it will be sufficient to mask all b64c arguments
> with '& 0x3f', like you already suggested in the other mail?
> 
> Bruno
> 
> diff --git a/lib/base64.c b/lib/base64.c
> index f3f7298..a00e0f4 100644
> --- a/lib/base64.c
> +++ b/lib/base64.c
> @@ -70,7 +70,7 @@ base64_encode_fast (const char *restrict in, size_t inlen,
> char *restrict out) {
>    while (inlen)
>      {
> -      *out++ = b64c[to_uchar (in[0]) >> 2];
> +      *out++ = b64c[(to_uchar (in[0]) >> 2) & 0x3f];
>        *out++ = b64c[((to_uchar (in[0]) << 4) + (to_uchar (in[1]) >> 4)) &
> 0x3f]; *out++ = b64c[((to_uchar (in[1]) << 2) + (to_uchar (in[2]) >> 6)) &
> 0x3f]; *out++ = b64c[to_uchar (in[2]) & 0x3f];
> @@ -103,7 +103,7 @@ base64_encode (const char *restrict in, size_t inlen,
> 
>    while (inlen && outlen)
>      {
> -      *out++ = b64c[to_uchar (in[0]) >> 2];
> +      *out++ = b64c[(to_uchar (in[0]) >> 2) & 0x3f];
>        if (!--outlen)
>          break;
>        *out++ = b64c[((to_uchar (in[0]) << 4)
> 
> Bruno

Thanks!  This also helps to suppress the false positives on cryptsetup
with Coverity Static Analysis version 2019.03.

Kamil




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-10 11:57           ` Kamil Dudka
@ 2019-05-10 14:11             ` Bruno Haible
  2019-05-10 14:32               ` Kamil Dudka
  0 siblings, 1 reply; 16+ messages in thread
From: Bruno Haible @ 2019-05-10 14:11 UTC (permalink / raw)
  To: Kamil Dudka; +Cc: Paul Eggert, bug-gnulib

Kamil Dudka wrote:
> Thanks!  This also helps to suppress the false positives on cryptsetup
> with Coverity Static Analysis version 2019.03.

Good! Since this is the approach that Paul prefers, I'm pushing this one:


2019-05-10  Bruno Haible  <bruno@clisp.org>

	base64: Avoid false positive warning from Coverity.
	Reported by Kamil Dudka <kdudka@redhat.com>.
	Idea by Paul Eggert.
	* lib/base64.c (base64_encode_fast, base64_encode): Add a no-op
	'& 0x3f' to the array index expressions. This convinces Coverity that
	there is no out-of-bounds array reference, regardless of the input.

diff --git a/lib/base64.c b/lib/base64.c
index f3f7298..a00e0f4 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -70,7 +70,7 @@ base64_encode_fast (const char *restrict in, size_t inlen, char *restrict out)
 {
   while (inlen)
     {
-      *out++ = b64c[to_uchar (in[0]) >> 2];
+      *out++ = b64c[(to_uchar (in[0]) >> 2) & 0x3f];
       *out++ = b64c[((to_uchar (in[0]) << 4) + (to_uchar (in[1]) >> 4)) & 0x3f];
       *out++ = b64c[((to_uchar (in[1]) << 2) + (to_uchar (in[2]) >> 6)) & 0x3f];
       *out++ = b64c[to_uchar (in[2]) & 0x3f];
@@ -103,7 +103,7 @@ base64_encode (const char *restrict in, size_t inlen,
 
   while (inlen && outlen)
     {
-      *out++ = b64c[to_uchar (in[0]) >> 2];
+      *out++ = b64c[(to_uchar (in[0]) >> 2) & 0x3f];
       if (!--outlen)
         break;
       *out++ = b64c[((to_uchar (in[0]) << 4)



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-10 14:11             ` Bruno Haible
@ 2019-05-10 14:32               ` Kamil Dudka
  0 siblings, 0 replies; 16+ messages in thread
From: Kamil Dudka @ 2019-05-10 14:32 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Paul Eggert, bug-gnulib

On Friday, May 10, 2019 4:11:45 PM CEST Bruno Haible wrote:
> Kamil Dudka wrote:
> > Thanks!  This also helps to suppress the false positives on cryptsetup
> > with Coverity Static Analysis version 2019.03.
> 
> Good! Since this is the approach that Paul prefers, I'm pushing this one:
> 
> 
> 2019-05-10  Bruno Haible  <bruno@clisp.org>
> 
> 	base64: Avoid false positive warning from Coverity.
> 	Reported by Kamil Dudka <kdudka@redhat.com>.
> 	Idea by Paul Eggert.
> 	* lib/base64.c (base64_encode_fast, base64_encode): Add a no-op
> 	'& 0x3f' to the array index expressions. This convinces Coverity that
> 	there is no out-of-bounds array reference, regardless of the input.
> 
> diff --git a/lib/base64.c b/lib/base64.c
> index f3f7298..a00e0f4 100644
> --- a/lib/base64.c
> +++ b/lib/base64.c
> @@ -70,7 +70,7 @@ base64_encode_fast (const char *restrict in, size_t inlen,
> char *restrict out) {
>    while (inlen)
>      {
> -      *out++ = b64c[to_uchar (in[0]) >> 2];
> +      *out++ = b64c[(to_uchar (in[0]) >> 2) & 0x3f];
>        *out++ = b64c[((to_uchar (in[0]) << 4) + (to_uchar (in[1]) >> 4)) &
> 0x3f]; *out++ = b64c[((to_uchar (in[1]) << 2) + (to_uchar (in[2]) >> 6)) &
> 0x3f]; *out++ = b64c[to_uchar (in[2]) & 0x3f];
> @@ -103,7 +103,7 @@ base64_encode (const char *restrict in, size_t inlen,
> 
>    while (inlen && outlen)
>      {
> -      *out++ = b64c[to_uchar (in[0]) >> 2];
> +      *out++ = b64c[(to_uchar (in[0]) >> 2) & 0x3f];
>        if (!--outlen)
>          break;
>        *out++ = b64c[((to_uchar (in[0]) << 4)

Works for me.  Thanks to both of you!

Kamil




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Coverity false positives triggered by gnulib's implementation of base64
  2019-05-10 11:32   ` Kamil Dudka
  2019-05-10 11:34     ` Florian Weimer
@ 2019-05-10 23:36     ` Paul Eggert
  1 sibling, 0 replies; 16+ messages in thread
From: Paul Eggert @ 2019-05-10 23:36 UTC (permalink / raw)
  To: Kamil Dudka; +Cc: bug-gnulib

On 5/10/19 4:32 AM, Kamil Dudka wrote:

> I do not think it is a good idea to change a piece of working code to make
> a static analysis false positives magically disappear.
I was thinking of making a change only if it makes the code a bit better 
even ignoring whether Coverity is used. Surely we wouldn't insist on 
slightly-worse code merely because we also want to further clutter it up 
with Coverity pacification.

> Getting precise results for checkers like
> this is computationally expensive and in the general case impossible.

Although that is true in general, in this particular case it's easy for 
an automated tool with Coverity's sophistication to check that the 
subscripts are in-range for the array. This is really a Coverity bug and 
I'd rather not add batches of comments to code just to cater to Coverity 
bugs. Particularly since Coverity is not free software and ordinary 
developers like me cannot use it.This sort of thing would send the wrong 
signal from the GNU project.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-05-10 23:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-07 14:22 Coverity false positives triggered by gnulib's implementation of base64 Kamil Dudka
2019-05-08  8:15 ` Bruno Haible
2019-05-09 16:14   ` Kamil Dudka
2019-05-09 20:35     ` Bruno Haible
2019-05-09 21:55       ` Paul Eggert
2019-05-09 22:13         ` Bruno Haible
2019-05-09 22:28           ` Paul Eggert
2019-05-10 11:57           ` Kamil Dudka
2019-05-10 14:11             ` Bruno Haible
2019-05-10 14:32               ` Kamil Dudka
2019-05-10 11:41       ` Kamil Dudka
2019-05-09 19:14 ` Paul Eggert
2019-05-10 11:32   ` Kamil Dudka
2019-05-10 11:34     ` Florian Weimer
2019-05-10 11:51       ` Kamil Dudka
2019-05-10 23:36     ` Paul Eggert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).