Re: Coverity false positives triggered by gnulib's implementation of base64

From: Paul Eggert <eggert@cs.ucla.edu>
To: Kamil Dudka <kdudka@redhat.com>
Cc: bug-gnulib@gnu.org
Subject: Re: Coverity false positives triggered by gnulib's implementation of base64
Date: Thu, 9 May 2019 12:14:40 -0700	[thread overview]
Message-ID: <1324b17b-a3a8-e1fe-f781-f360343b89a1@cs.ucla.edu> (raw)
In-Reply-To: <2475310.f135Qeco9B@kdudka-nb>

[-- Attachment #1: Type: text/plain, Size: 2585 bytes --]

On 5/7/19 7:22 AM, Kamil Dudka wrote:
> It triggered the following false positives in the cryptsetup project:
>
> Error: TAINTED_SCALAR:
> lib/luks2/luks2_digest_pbkdf2.c:117: tainted_data_argument: Calling function "crypt_random_get" taints argument "salt".
> lib/luks2/luks2_digest_pbkdf2.c:157: tainted_data: Passing tainted variable "salt" to a tainted sink.
>
> Error: TAINTED_SCALAR:
> lib/luks2/luks2_keyslot_luks2.c:445: tainted_data_argument: Calling function "crypt_random_get" taints argument "salt".
> lib/luks2/luks2_keyslot_luks2.c:448: tainted_data: Passing tainted variable "salt" to a tainted sink.
>
>
> ... but it can affect other gnulib-based projects, too.

Can you explain in more detail what these diagnostic messages mean, and
why this is Gnulib's "fault"? For example, how do you know that the
reports are false positives and not true positives? That sort of thing.
Normally I'm a fan of static analysis, but it can go too far sometimes....

> gnulib's base64_encode() as seen by Coverity Analysis represents (3)
> because its implementation uses byte swaps.
base64_encode does not actually swap bytes. For example, it wouldn't be
reasonable for base64.c to include byteswap.h in order to clarify
base64_encode, because the byteswap.h's byteswapping operations wouldn't
help. Also, even if base64_encode were swapping bytes, byte-swapping by
itself is not a problem and I don't see why Heartbleed is relevant here,
as the resulting subscripting operations are obviously in range. So I
would like to to know exactly why Coverity infers that this particular
byte-manipulation is problematic.

For example, does the problem go away if you replace 'b64c[to_uchar
(in[0]) >> 2]' with 'b64c[(to_uchar (in[0]) >> 2) & 0x3f]'? If so, it
would indicate that Coverity merely needs to improve their range analysis.

More generally, does it help if you make simple changes to base64.c that
might clarify the code a bit and in addition pacify Coverity? If so,
that might be preferable. Something like the attached untested patch,
say. The idea is to somehow convince Coverity that the code is OK, while
not making it worse (and maybe making it better).


> /* coverity[-tainted_data_sink: arg-0] */
Just adding a line like that would be too cryptic. We can't expect the
reader to know the ins and outs of a proprietary 3rd-party tool.
Instead, the comment would have to explain what's going on and why the
comment is needed, well enough for the Gnulib audience.

However, I'm hoping that we don't need the line at all, because it
sounds like Coverity might be buggy here.


[-- Attachment #2: base64.patch --]
[-- Type: text/x-patch, Size: 2299 bytes --]

diff --git a/lib/base64.c b/lib/base64.c
index f3f7298aa..786b0f418 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -64,11 +64,11 @@ static const char b64c[64] =
 
 /* Base64 encode IN array of size INLEN into OUT array. OUT needs
    to be of length >= BASE64_LENGTH(INLEN), and INLEN needs to be
-   a multiple of 3.  */
+   a positive multiple of 3.  */
 static void
 base64_encode_fast (const char *restrict in, size_t inlen, char *restrict out)
 {
-  while (inlen)
+  do
     {
       *out++ = b64c[to_uchar (in[0]) >> 2];
       *out++ = b64c[((to_uchar (in[0]) << 4) + (to_uchar (in[1]) >> 4)) & 0x3f];
@@ -78,6 +78,7 @@ base64_encode_fast (const char *restrict in, size_t inlen, char *restrict out)
       inlen -= 3;
       in += 3;
     }
+  while (inlen);
 }
 
 /* Base64 encode IN array of size INLEN into OUT array of size OUTLEN.
@@ -88,6 +89,9 @@ void
 base64_encode (const char *restrict in, size_t inlen,
                char *restrict out, size_t outlen)
 {
+  if (!outlen)
+    return;
+
   /* Note this outlen constraint can be enforced at compile time.
      I.E. that the output buffer is exactly large enough to hold
      the encoded inlen bytes.  The inlen constraints (of corresponding
@@ -101,16 +105,17 @@ base64_encode (const char *restrict in, size_t inlen,
       return;
     }
 
-  while (inlen && outlen)
+  if (inlen)
+   while (true)
     {
       *out++ = b64c[to_uchar (in[0]) >> 2];
       if (!--outlen)
-        break;
+        return;
       *out++ = b64c[((to_uchar (in[0]) << 4)
                        + (--inlen ? to_uchar (in[1]) >> 4 : 0))
                       & 0x3f];
       if (!--outlen)
-        break;
+        return;
       *out++ =
         (inlen
          ? b64c[((to_uchar (in[1]) << 2)
@@ -118,18 +123,17 @@ base64_encode (const char *restrict in, size_t inlen,
                   & 0x3f]
          : '=');
       if (!--outlen)
-        break;
+        return;
       *out++ = inlen ? b64c[to_uchar (in[2]) & 0x3f] : '=';
       if (!--outlen)
+        return;
+      if (inlen <= 1)
         break;
-      if (inlen)
-        inlen--;
-      if (inlen)
-        in += 3;
+      inlen--;
+      in += 3;
     }
 
-  if (outlen)
-    *out = '\0';
+  *out = '\0';
 }
 
 /* Allocate a buffer and store zero terminated base64 encoded data