tar + cpio - covscan issues

bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed

* tar + cpio - covscan issues
@ 2021-04-08 11:37 Ondrej Dubaj
  2021-04-10 10:26 ` Bruno Haible
  0 siblings, 1 reply; 17+ messages in thread
From: Ondrej Dubaj @ 2021-04-08 11:37 UTC (permalink / raw
  To: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 10418 bytes --]

Hello,

proposing patch for some of the issues found by coverity scan in tar-1.34

Patch:

diff --git a/gnu/malloc/scratch_buffer_dupfree.c
b/gnu/malloc/scratch_buffer_dupfree.c
index 775bff5..3b246f2 100644
--- a/gnu/malloc/scratch_buffer_dupfree.c
+++ b/gnu/malloc/scratch_buffer_dupfree.c
@@ -35,7 +35,13 @@ __libc_scratch_buffer_dupfree (struct scratch_buffer
*buffer, size_t size)
   else
     {
       void *copy = realloc (data, size);
-      return copy != NULL ? copy : data;
+      if (copy != NULL)
+      {
+        data = NULL;
+        return copy;
+      }
+      else
+        return data;
     }
 }
 libc_hidden_def (__libc_scratch_buffer_dupfree)
diff --git a/lib/wordsplit.c b/lib/wordsplit.c
index 661a4f8..6ccaa2a 100644
--- a/lib/wordsplit.c
+++ b/lib/wordsplit.c
@@ -615,7 +615,6 @@ coalesce_segment (struct wordsplit *wsp, struct
wordsplit_node *node)
          node->flags |= p->flags & _WSNF_QUOTE;
          wsnode_remove (wsp, p);
          stop = p == end;
-         wsnode_free (p);
        }
       p = next;
     }

In addition, there are some issues which are not resolved by this patch.
There is a compiler warning about issues in utimens.c, which I find as
false positives. Another false positive is memory leak in malloca.c. Issue
presented in stdopen.c might be actually a problem. Can you please
investigate it and give feedback ?

Covscan results:

Error: CPPCHECK_WARNING (CWE-401):
tar-1.34/gnu/malloc/scratch_buffer_dupfree.c:38: error[memleak]:
Memory leak: copy
#   36|       {
#   37|         void *copy = realloc (data, size);
#   38|->       return copy != NULL ? copy : data;
#   39|       }
#   40|   }

Error: CPPCHECK_WARNING (CWE-401):
tar-1.34/gnu/malloca.c:67: error[memleak]: Memory leak: mem
#   65|             ((small_t *) p)[-1] = p - mem;
#   66|             /* p     sa_alignment_max mod 2*sa_alignment_max.  */
#   67|->           return p;
#   68|           }
#   69|       }

Error: RESOURCE_LEAK (CWE-772):
tar-1.34/gnu/stdopen.c:51: open_fn: Returning handle opened by "open".
[Note: The source code implementation of the function has been
overridden by a user model.]
tar-1.34/gnu/stdopen.c:51: var_assign: Assigning: "full_fd" = handle
returned from "open("/dev/full", mode)".
tar-1.34/gnu/stdopen.c:52: var_assign: Assigning: "new_fd" = "full_fd".
tar-1.34/gnu/stdopen.c:62: leaked_handle: Handle variable "new_fd"
going out of scope leaks the handle.
tar-1.34/gnu/stdopen.c:62: leaked_handle: Handle variable "full_fd"
going out of scope leaks the handle.
#   60|                 return 0;
#   61|               }
#   62|->         }
#   63|       }
#   64|

Error: RESOURCE_LEAK (CWE-772):
tar-1.34/gnu/stdopen.c:52: open_fn: Returning handle opened by "open".
[Note: The source code implementation of the function has been
overridden by a user model.]
tar-1.34/gnu/stdopen.c:52: var_assign: Assigning: "new_fd" = handle
returned from "open("/dev/null", mode)".
tar-1.34/gnu/stdopen.c:62: leaked_handle: Handle variable "new_fd"
going out of scope leaks the handle.
#   60|                 return 0;
#   61|               }
#   62|->         }
#   63|       }
#   64|

Error: COMPILER_WARNING (CWE-758):
tar-1.34/gnu/utimens.c: scope_hint: In function 'fdutimens'
tar-1.34/gnu/utimens.c:399:17: warning[-Wstringop-overflow=]:
'update_timespec' accessing 16 bytes in a region of size 8
#  399 |       if (ts && update_timespec (&st, &ts))
#      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~
tar-1.34/gnu/utimens.c:399:17: note: referencing argument 2 of type
'struct timespec * *'
tar-1.34/gnu/utimens.c:136:1: note: in a call to function 'update_timespec'
#  136 | update_timespec (struct stat const *statbuf, struct timespec *ts[2])
#      | ^~~~~~~~~~~~~~~
#  397|             && (fd < 0 ? stat (file, &st) : fstat (fd, &st)))
#  398|           return -1;
#  399|->       if (ts && update_timespec (&st, &ts))
#  400|           return 0;
#  401|       }

Error: COMPILER_WARNING (CWE-758):
tar-1.34/gnu/utimens.c: scope_hint: In function 'lutimens'
tar-1.34/gnu/utimens.c:612:17: warning[-Wstringop-overflow=]:
'update_timespec' accessing 16 bytes in a region of size 8
#  612 |       if (ts && update_timespec (&st, &ts))
#      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~
tar-1.34/gnu/utimens.c:612:17: note: referencing argument 2 of type
'struct timespec * *'
tar-1.34/gnu/utimens.c:136:1: note: in a call to function 'update_timespec'
#  136 | update_timespec (struct stat const *statbuf, struct timespec *ts[2])
#      | ^~~~~~~~~~~~~~~
#  610|         if (adjustment_needed != 3 && lstat (file, &st))
#  611|           return -1;
#  612|->       if (ts && update_timespec (&st, &ts))
#  613|           return 0;
#  614|       }

Error: USE_AFTER_FREE (CWE-416):
tar-1.34/lib/wordsplit.c:683: freed_arg: "coalesce_segment" frees "p->next".
tar-1.34/lib/wordsplit.c:680: use_after_free: Using freed pointer "p->next".
#  678|     struct wordsplit_node *p;
#  679|
#  680|->   for (p = wsp->ws_head; p; p = p->next)
#  681|       {
#  682|         if (p->flags & _WSNF_JOIN)


As well, proposing patch for cpio-2.13:

Patch:

diff --git a/src/tar.c b/src/tar.c
index 99ef8a2..a5873e7 100644
--- a/src/tar.c
+++ b/src/tar.c
@@ -146,6 +146,7 @@ write_out_tar_header (struct cpio_file_stat *file_hdr,
int out_des)
   name_len = strlen (file_hdr->c_name);
   if (name_len <= TARNAMESIZE)
     {
+      memset(tar_hdr->name, '\0', name_len+1);
       strncpy (tar_hdr->name, file_hdr->c_name, name_len);
     }
   else
@@ -173,8 +174,9 @@ write_out_tar_header (struct cpio_file_stat *file_hdr,
int out_des)
  {
   /* process_copy_out makes sure that c_tar_linkname is shorter
      than TARLINKNAMESIZE.  */
+    memset(tar_hdr->linkname, '\0', TARLINKNAMESIZE);
   strncpy (tar_hdr->linkname, file_hdr->c_tar_linkname,
-   TARLINKNAMESIZE);
+   TARLINKNAMESIZE-1);
   tar_hdr->typeflag = LNKTYPE;
   to_ascii (tar_hdr->size, 0, 12, LG_8, true);
  }
@@ -200,8 +202,9 @@ write_out_tar_header (struct cpio_file_stat *file_hdr,
int out_des)
       tar_hdr->typeflag = SYMTYPE;
       /* process_copy_out makes sure that c_tar_linkname is shorter
  than TARLINKNAMESIZE.  */
+      memset(tar_hdr->linkname, '\0', TARLINKNAMESIZE);
       strncpy (tar_hdr->linkname, file_hdr->c_tar_linkname,
-       TARLINKNAMESIZE);
+       TARLINKNAMESIZE-1);
       to_ascii (tar_hdr->size, 0, 12, LG_8, true);
       break;
 #endif /* CP_IFLNK */
@@ -211,6 +214,7 @@ write_out_tar_header (struct cpio_file_stat *file_hdr,
int out_des)
     {
       char *name;

+      memset(tar_hdr->version, '\0', TVERSLEN+1);
       strncpy (tar_hdr->magic, TMAGIC, TMAGLEN);
       strncpy (tar_hdr->version, TVERSION, TVERSLEN);

In addition, there are some issues which are not resolved by this patch.
There is a compiler warning about issues in utimens.c (same as in tar),
which I find as false positives. Can you please investigate it and give
feedback ?

Thank you.

Ondrej

Covscan results:

Error: COMPILER_WARNING (CWE-758):
cpio-2.13/gnu/utimens.c: scope_hint: In function 'fdutimens'
cpio-2.13/gnu/utimens.c:296:17: warning[-Wstringop-overflow=]:
'update_timespec' accessing 16 bytes in a region of size 8
#  296 |       if (ts && update_timespec (&st, &ts))
#      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~
cpio-2.13/gnu/utimens.c:296:17: note: referencing argument 2 of type
'struct timespec * *'
cpio-2.13/gnu/utimens.c:131:1: note: in a call to function 'update_timespec'
#  131 | update_timespec (struct stat const *statbuf, struct timespec *ts[2])
#      | ^~~~~~~~~~~~~~~
#  294|             && (fd < 0 ? stat (file, &st) : fstat (fd, &st)))
#  295|           return -1;
#  296|->       if (ts && update_timespec (&st, &ts))
#  297|           return 0;
#  298|       }

Error: COMPILER_WARNING (CWE-758):
cpio-2.13/gnu/utimens.c: scope_hint: In function 'lutimens'
cpio-2.13/gnu/utimens.c:507:17: warning[-Wstringop-overflow=]:
'update_timespec' accessing 16 bytes in a region of size 8
#  507 |       if (ts && update_timespec (&st, &ts))
#      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~
cpio-2.13/gnu/utimens.c:507:17: note: referencing argument 2 of type
'struct timespec * *'
cpio-2.13/gnu/utimens.c:131:1: note: in a call to function 'update_timespec'
#  131 | update_timespec (struct stat const *statbuf, struct timespec *ts[2])
#      | ^~~~~~~~~~~~~~~
#  505|         if (adjustment_needed != 3 && lstat (file, &st))
#  506|           return -1;
#  507|->       if (ts && update_timespec (&st, &ts))
#  508|           return 0;
#  509|       }

Error: COMPILER_WARNING (CWE-758):
cpio-2.13/src/tar.c: scope_hint: In function 'write_out_tar_header'
cpio-2.13/src/tar.c:149:7: warning[-Wstringop-overflow=]: 'strncpy'
specified bound depends on the length of the source argument
cpio-2.13/src/tar.c:146:14: note: length computed here
#  147|     if (name_len <= TARNAMESIZE)
#  148|       {
#  149|->       strncpy (tar_hdr->name, file_hdr->c_name, name_len);
#  150|       }
#  151|     else

Error: BUFFER_SIZE (CWE-170):
cpio-2.13/src/tar.c:176: buffer_size_warning: Calling "strncpy" with a
maximum size argument of 100 bytes on destination array
"tar_hdr->linkname" of size 100 bytes might leave the destination
string unterminated.
#  174|   	  /* process_copy_out makes sure that c_tar_linkname is shorter
#  175|   	     than TARLINKNAMESIZE.  */
#  176|-> 	  strncpy (tar_hdr->linkname, file_hdr->c_tar_linkname,
#  177|   		   TARLINKNAMESIZE);
#  178|   	  tar_hdr->typeflag = LNKTYPE;

Error: BUFFER_SIZE (CWE-170):
cpio-2.13/src/tar.c:203: buffer_size_warning: Calling "strncpy" with a
maximum size argument of 100 bytes on destination array
"tar_hdr->linkname" of size 100 bytes might leave the destination
string unterminated.
#  201|         /* process_copy_out makes sure that c_tar_linkname is shorter
#  202|   	 than TARLINKNAMESIZE.  */
#  203|->       strncpy (tar_hdr->linkname, file_hdr->c_tar_linkname,
#  204|   	       TARLINKNAMESIZE);
#  205|         to_ascii (tar_hdr->size, 0, 12, LG_8, true);

Error: BUFFER_SIZE (CWE-120):
cpio-2.13/src/tar.c:215: buffer_size: Calling "strncpy" with a source
string whose length (2 chars) is greater than or equal to the size
argument (2) will fail to null-terminate "tar_hdr->version".
#  213|
#  214|         strncpy (tar_hdr->magic, TMAGIC, TMAGLEN);
#  215|->       strncpy (tar_hdr->version, TVERSION, TVERSLEN);
#  216|
#  217|         name = getuser (file_hdr->c_uid);

[-- Attachment #2: Type: text/html, Size: 12210 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-08 11:37 tar + cpio - covscan issues Ondrej Dubaj
@ 2021-04-10 10:26 ` Bruno Haible
  2021-04-10 13:34   ` Kamil Dudka
  2021-04-12  5:19   ` Ondrej Dubaj
  0 siblings, 2 replies; 17+ messages in thread
From: Bruno Haible @ 2021-04-10 10:26 UTC (permalink / raw
  To: bug-gnulib; +Cc: Ondrej Dubaj

Hi Ondrej,

> proposing patch for some of the issues found by coverity scan in tar-1.34

Thanks for these reports.

When we get Coverity reports, we fix the things that are valid complaints
about the code, but we do NOT change the code to reduce the number of reported
issues. That is because
  1) Coverity has a UI where you can mark issues are false issues, even with
     a rationale, and such resolutions are even propagated when the same source
     file is used in a different project (such as gnulib vs. tar).
  2) About 80% to 90% of the reported issues are false issues. We would be
     seriously contorting the source code if we attempted to change the code
     to avoid the reports.

Bruno



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-10 10:26 ` Bruno Haible
@ 2021-04-10 13:34   ` Kamil Dudka
  2021-04-10 13:58     ` Bruno Haible
  2021-04-12  5:19   ` Ondrej Dubaj
  1 sibling, 1 reply; 17+ messages in thread
From: Kamil Dudka @ 2021-04-10 13:34 UTC (permalink / raw
  To: Bruno Haible; +Cc: bug-gnulib, Ondrej Dubaj

On Saturday, April 10, 2021 12:26:37 PM CEST Bruno Haible wrote:
> Hi Ondrej,
> 
> > proposing patch for some of the issues found by coverity scan in tar-1.34
> 
> Thanks for these reports.
> 
> When we get Coverity reports, we fix the things that are valid complaints
> about the code, but we do NOT change the code to reduce the number of
> reported issues. That is because

If you have enough time to manually review the same false positives over and 
over, this might work well for you.  Not everybody is in the same situation.

>   1) Coverity has a UI where you can mark issues are false issues, even with
> a rationale, and such resolutions are even propagated when the same source
> file is used in a different project (such as gnulib vs. tar).

So you have access to this UI, not everybody does.  Some developers prefer 
terminal-based workflow over web-based UI.  In any case, the data you enter 
through this UI is completely isolated from the open-source software that
you maintain.  Downstream consumers either have to feed their own instance
of the UI manually again, or just use something else without any cooperation 
with upstream.

> 2) About 80%
> to 90% of the reported issues are false issues. We would be seriously
> contorting the source code if we attempted to change the code to avoid the
> reports.

If you keep fixing real issues and ignoring false positives, such a situation 
is kind of expected.

Kamil

> Bruno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-10 13:34   ` Kamil Dudka
@ 2021-04-10 13:58     ` Bruno Haible
  2021-04-10 16:00       ` Kamil Dudka
  0 siblings, 1 reply; 17+ messages in thread
From: Bruno Haible @ 2021-04-10 13:58 UTC (permalink / raw
  To: Kamil Dudka; +Cc: bug-gnulib, Ondrej Dubaj

Kamil Dudka wrote:

> > When we get Coverity reports, we fix the things that are valid complaints
> > about the code, but we do NOT change the code to reduce the number of
> > reported issues. That is because
> 
> If you have enough time to manually review the same false positives over and 
> over, this might work well for you.  Not everybody is in the same situation.

Paul and I receive a mail with the *new* issues once a week. We never review
the same issue more than once.

> So you have access to this UI, not everybody does.  Some developers prefer 
> terminal-based workflow over web-based UI.

I didn't know a different workflow was possible.

But in that workflow, in which you control everything yourself (no SaaS),
you can surely commit into the repo
  - either the list of issues produced by the last run, or
  - a list of issues that you have found to be false ones,
and use that information to limit what you have to review in the next run?

No one forces you to review the same false positives over and over again.

> > 2) About 80%
> > to 90% of the reported issues are false issues. We would be seriously
> > contorting the source code if we attempted to change the code to avoid the
> > reports.
> 
> If you keep fixing real issues and ignoring false positives, such a situation 
> is kind of expected.

So you are in favour of adding workarounds such as the proposed

      if (copy != NULL)
        {
          data = NULL;
          return copy;
        }
      else
        return data;

in 5 to 10 places, in order to get a useful warning in 1 place? I am not.

Bruno



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-10 13:58     ` Bruno Haible
@ 2021-04-10 16:00       ` Kamil Dudka
  2021-04-10 18:40         ` Bruno Haible
  0 siblings, 1 reply; 17+ messages in thread
From: Kamil Dudka @ 2021-04-10 16:00 UTC (permalink / raw
  To: Bruno Haible; +Cc: bug-gnulib, Ondrej Dubaj

On Saturday, April 10, 2021 3:58:57 PM CEST Bruno Haible wrote:
> Kamil Dudka wrote:
> Paul and I receive a mail with the *new* issues once a week. We never review
> the same issue more than once.

I was not talking about the private notifications you get from Coverity Scan.  
I meant the public reports on this mailing-list like the one that Ondrej sent.  
As gnulib is embedded in multiple RPM packages of Fedora/RHEL, such reports
are going to come periodically until you change your attitude to handling 
false positives upstream.

> > So you have access to this UI, not everybody does.  Some developers prefer
> > terminal-based workflow over web-based UI.
> 
> I didn't know a different workflow was possible.

Red Hat uses csdiff to process the results of static analyzers (not only from 
Coverity).  It is an open source tool that uses open data formats:

    https://github.com/kdudka/csdiff

> But in that workflow, in which you control everything yourself (no SaaS),
> you can surely commit into the repo
>   - either the list of issues produced by the last run, or
>   - a list of issues that you have found to be false ones,
> and use that information to limit what you have to review in the next run?

Sure, we use both.  The problem is that this is a duplicated effort to your 
reviews of Coverity results upstream.  And many downstream consumers have to 
duplicate the effort as well.

The main advantage of code improvements and inline code annotations is that 
they travel together with the source code when the files are moved in the 
source tree, across the projects, or embedded into other projects.  All the 
downstream consumers can consume these improvements at no additional cost.

> No one forces you to review the same false positives over and over again.
> 
> > > 2) About 80%
> > > to 90% of the reported issues are false issues. We would be seriously
> > > contorting the source code if we attempted to change the code to avoid
> > > the
> > > reports.
> > 
> > If you keep fixing real issues and ignoring false positives, such a
> > situation is kind of expected.
> 
> So you are in favour of adding workarounds such as the proposed
> 
>       if (copy != NULL)
>         {
>           data = NULL;
>           return copy;
>         }
>       else
>         return data;
> 
> in 5 to 10 places, in order to get a useful warning in 1 place? I am not.
> 
> Bruno

This is a really bad example and I agree that such changes should be rejected.  
It over-complicates the code and introduces a dead store, which will probably 
trigger some noise with other tools.

By the way, this was supposed to eliminate a report of Cppcheck, which the 
mentioned UI of Coverity does not cover at all.  It could be easily silenced 
with an inline suppression though:

--- a/gnu/malloc/scratch_buffer_dupfree.c
+++ b/gnu/malloc/scratch_buffer_dupfree.c
@@ -35,6 +35,7 @@ __libc_scratch_buffer_dupfree (struct scratch_buffer 
*buffer, size_t size)
   else
     {
       void *copy = realloc (data, size);
+      /* cppcheck-suppress memleak */
       return copy != NULL ? copy : data;
     }
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-10 16:00       ` Kamil Dudka
@ 2021-04-10 18:40         ` Bruno Haible
  2021-04-11 19:50           ` Paul Eggert
  2021-04-15 20:07           ` Kamil Dudka
  0 siblings, 2 replies; 17+ messages in thread
From: Bruno Haible @ 2021-04-10 18:40 UTC (permalink / raw
  To: Kamil Dudka; +Cc: bug-gnulib

Hi Kamil,

> I meant the public reports on this mailing-list like the one that Ondrej sent.  
> As gnulib is embedded in multiple RPM packages of Fedora/RHEL, such reports
> are going to come periodically until you change your attitude to handling 
> false positives upstream.
> ...
> The problem is that this is a duplicated effort to your 
> reviews of Coverity results upstream.

So far the number of these reports has been small and manageable. When it
gets higher, we may very well use your 'csdiff' package.

Is there, besides this package, also a "best practices" document how to
use it (e.g. where to store the results of previous run, how to map categories
to priorities, etc.)?

> And many downstream consumers have to 
> duplicate the effort as well.

Downstream consumers can exclude the gnulib-copied directories using the 'csgrep'
program, AFAIU?

> The main advantage of code improvements and inline code annotations is that 
> they travel together with the source code when the files are moved in the 
> source tree, across the projects, or embedded into other projects.  All the 
> downstream consumers can consume these improvements at no additional cost.

True. But it clutters up the source code. For tools that produce 5-10 times
more false reports than good reports, I wouldn't do this. Things are different
for tools or warning categories which, on average, produce > 75% good reports
(like, specific warnings included in "gcc -Wall").

Bruno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-10 18:40         ` Bruno Haible
@ 2021-04-11 19:50           ` Paul Eggert
  2021-04-15 20:07           ` Kamil Dudka
  1 sibling, 0 replies; 17+ messages in thread
From: Paul Eggert @ 2021-04-11 19:50 UTC (permalink / raw
  To: Kamil Dudka; +Cc: Gnulib bugs

On 4/10/21 11:40 AM, Bruno Haible wrote:
> True. But it clutters up the source code. For tools that produce 5-10 times
> more false reports than good reports, I wouldn't do this.

Likewise. Static analysis tools come and go, but source code is forever.

I like Bruno's suggestion of suppressing analysis of Gnulib-copied 
source files. That's longstanding practice when using Gnulib in other 
projects. For example, in Coreutils, './configure --enable-gcc-warnings' 
uses a different set of gcc -W flags for Gnulib-supplied code because 
Gnulib doesn't bother pacifying GCC about some issues where Coreutils 
does bother (they would all be false alarms anyway). You could do 
something similar with Coverity, cppcheck, etc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-10 10:26 ` Bruno Haible
  2021-04-10 13:34   ` Kamil Dudka
@ 2021-04-12  5:19   ` Ondrej Dubaj
  2021-04-12  7:42     ` Paul Eggert
  1 sibling, 1 reply; 17+ messages in thread
From: Ondrej Dubaj @ 2021-04-12  5:19 UTC (permalink / raw
  To: Bruno Haible; +Cc: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

Thanks for the explanation, which of the reports do you consider as false
positives and which as real issues ? If there are some real issues, are you
willing to fix them ?

Thank you.

Ondrej

On Sat, Apr 10, 2021 at 12:32 PM Bruno Haible <bruno@clisp.org> wrote:

> Hi Ondrej,
>
> > proposing patch for some of the issues found by coverity scan in tar-1.34
>
> Thanks for these reports.
>
> When we get Coverity reports, we fix the things that are valid complaints
> about the code, but we do NOT change the code to reduce the number of
> reported
> issues. That is because
>   1) Coverity has a UI where you can mark issues are false issues, even
> with
>      a rationale, and such resolutions are even propagated when the same
> source
>      file is used in a different project (such as gnulib vs. tar).
>   2) About 80% to 90% of the reported issues are false issues. We would be
>      seriously contorting the source code if we attempted to change the
> code
>      to avoid the reports.
>
> Bruno
>
>

[-- Attachment #2: Type: text/html, Size: 1414 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-12  5:19   ` Ondrej Dubaj
@ 2021-04-12  7:42     ` Paul Eggert
  2021-04-12  7:47       ` Ondrej Dubaj
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Eggert @ 2021-04-12  7:42 UTC (permalink / raw
  To: Ondrej Dubaj; +Cc: bug-gnulib

On 4/11/21 10:19 PM, Ondrej Dubaj wrote:
> which of the reports do you consider as 
> false positives and which as real issues ? If there are some real 
> issues, are you willing to fix them ?

Every report is a false positive. I assume you're talking about the 
reports here:

https://lists.gnu.org/r/bug-gnulib/2021-04/msg00096.html

The utimens.c false positives are apparently due to a confusing but 
technically-valid use of pointers, which I fixed in Gnulib here:

https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=a3a946f670718d0dee5a7425ad5ac0a29fb46ea1

No other fixes should be needed.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-12  7:42     ` Paul Eggert
@ 2021-04-12  7:47       ` Ondrej Dubaj
  0 siblings, 0 replies; 17+ messages in thread
From: Ondrej Dubaj @ 2021-04-12  7:47 UTC (permalink / raw
  To: Paul Eggert; +Cc: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

Thanks for your reply!

On Mon, Apr 12, 2021 at 9:42 AM Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 4/11/21 10:19 PM, Ondrej Dubaj wrote:
> > which of the reports do you consider as
> > false positives and which as real issues ? If there are some real
> > issues, are you willing to fix them ?
>
> Every report is a false positive. I assume you're talking about the
> reports here:
>
> https://lists.gnu.org/r/bug-gnulib/2021-04/msg00096.html
>
> The utimens.c false positives are apparently due to a confusing but
> technically-valid use of pointers, which I fixed in Gnulib here:
>
>
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=a3a946f670718d0dee5a7425ad5ac0a29fb46ea1
>
> No other fixes should be needed.
>
>

[-- Attachment #2: Type: text/html, Size: 1328 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-10 18:40         ` Bruno Haible
  2021-04-11 19:50           ` Paul Eggert
@ 2021-04-15 20:07           ` Kamil Dudka
  2021-04-15 20:30             ` Paul Eggert
  2021-04-16 22:01             ` Bruno Haible
  1 sibling, 2 replies; 17+ messages in thread
From: Kamil Dudka @ 2021-04-15 20:07 UTC (permalink / raw
  To: Bruno Haible; +Cc: bug-gnulib

Hi Bruno,

On Saturday, April 10, 2021 8:40:06 PM CEST Bruno Haible wrote:
> Hi Kamil,
> 
> > I meant the public reports on this mailing-list like the one that Ondrej
> > sent. As gnulib is embedded in multiple RPM packages of Fedora/RHEL, such
> > reports are going to come periodically until you change your attitude to
> > handling false positives upstream.
> > ...
> > The problem is that this is a duplicated effort to your
> > reviews of Coverity results upstream.
> 
> So far the number of these reports has been small and manageable. When it
> gets higher, we may very well use your 'csdiff' package.
> 
> Is there, besides this package, also a "best practices" document how to
> use it (e.g. where to store the results of previous run, how to map
> categories to priorities, etc.)?

these tools can be fed with Coverity's JSON format, which can be obtained
with `cov-format-errors --json-output-v7 ...`.  I am not sure whether this 
utility is available to users of scan.coverity.com though.  csdiff/csgrep can 
also be fed with the diagnostic output of GCC and other tools with compatible 
output format.

Storing results from previous runs works reliably only until you update the 
static analyzers.  So you either need to always check their versions or build 
the code twice to obtain comparable results.

Priorities are usually not used for upstream scans because upstream developers 
either fix/suppress all the reported issues, or at least the issues that are 
introduced by the changes to be merged.

We have to use priorities downstream for some projects though.  For example, 
our scan of coreutils-8.32-18.el9 resulted in 237 reports, which would be 
nearly impossible for me to review.  Out of them 11 reports were classified
as important to be reviewed/fixed.

> > And many downstream consumers have to
> > duplicate the effort as well.
> 
> Downstream consumers can exclude the gnulib-copied directories using the
> 'csgrep' program, AFAIU?

Not so easily.  csgrep can filter the results by path in the source tree.
The problem with gnulib is that different projects embed it in different 
directories.  For example, coreutils has it in /lib whereas findutils has
it in /gl/lib while /lib contains other source files that we do not want
to exclude.  So we would have to maintain such exclusion lists per project.

People maintaining their own medium-size projects can easily play with this.  
I am in a different situation when I need to scan 3700 distinct projects and 
approx. 480 million lines of code with more or less the same manpower ;-)

Kamil

> > The main advantage of code improvements and inline code annotations is
> > that
> > they travel together with the source code when the files are moved in the
> > source tree, across the projects, or embedded into other projects.  All
> > the
> > downstream consumers can consume these improvements at no additional cost.
> 
> True. But it clutters up the source code. For tools that produce 5-10 times
> more false reports than good reports, I wouldn't do this. Things are
> different for tools or warning categories which, on average, produce > 75%
> good reports (like, specific warnings included in "gcc -Wall").
> 
> Bruno

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-15 20:07           ` Kamil Dudka
@ 2021-04-15 20:30             ` Paul Eggert
  2021-04-16  8:02               ` Kamil Dudka
  2021-04-16 22:01             ` Bruno Haible
  1 sibling, 1 reply; 17+ messages in thread
From: Paul Eggert @ 2021-04-15 20:30 UTC (permalink / raw
  To: Kamil Dudka, Bruno Haible; +Cc: bug-gnulib

On 4/15/21 1:07 PM, Kamil Dudka wrote:
> People maintaining their own medium-size projects can easily play with this.
> I am in a different situation when I need to scan 3700 distinct projects and
> approx. 480 million lines of code with more or less the same manpower ;-)

I can appreciate the amount of work it takes to maintain all that 
scanning. Still, we have to be careful here not to let the tail wag the 
dog. The false-alarm rate from Coverity is too high for us to install 
patches needed only to pacify Coverity. Similarly for GCC with all its 
warning flags enabled.

> we would have to maintain such exclusion lists per project.

My guess is that overall this would be less work, than the work of 
installing and reliably maintaining patches that pacify all combinations 
of Coverity and 'gcc -Wall -Wextra -Wetc' flags used by any downstream 
projects, without breaking or slowing down anything in any project. (But 
of course messing with exclusion lists would be work that you'd have to 
do, rather than us. :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-15 20:30             ` Paul Eggert
@ 2021-04-16  8:02               ` Kamil Dudka
  2021-04-16 18:56                 ` Paul Eggert
  0 siblings, 1 reply; 17+ messages in thread
From: Kamil Dudka @ 2021-04-16  8:02 UTC (permalink / raw
  To: Paul Eggert; +Cc: bug-gnulib, Bruno Haible

On Thursday, April 15, 2021 10:30:14 PM CEST Paul Eggert wrote:
> On 4/15/21 1:07 PM, Kamil Dudka wrote:
> > People maintaining their own medium-size projects can easily play with
> > this. I am in a different situation when I need to scan 3700 distinct
> > projects and approx. 480 million lines of code with more or less the same
> > manpower ;-)
> I can appreciate the amount of work it takes to maintain all that
> scanning. Still, we have to be careful here not to let the tail wag the
> dog. The false-alarm rate from Coverity is too high for us to install
> patches needed only to pacify Coverity. Similarly for GCC with all its
> warning flags enabled.
> 
> > we would have to maintain such exclusion lists per project.
> 
> My guess is that overall this would be less work, than the work of
> installing and reliably maintaining patches that pacify all combinations
> of Coverity and 'gcc -Wall -Wextra -Wetc' flags used by any downstream
> projects, without breaking or slowing down anything in any project. (But
> of course messing with exclusion lists would be work that you'd have to
> do, rather than us. :-)

Not only me.  This affects all downstream consumers of gnulib-based projects.

On a second thought, excluding gnulib source code from static analysis is not 
an option for us anyway.  We use such exclusion lists for unit tests that run 
in our test environment but that are not distributed to customers.

gnulib is a different case because some parts of its code run in production 
environment of our customers.  Red Hat certifies the software and we cannot 
say that we skipped static analysis of gnulib because upstream already did it.  
We have to (re)verify the software that we distribute in the end.

I am reading your responses as "upstream is not going to change anything".  We 
will have to find some ways to deduplicate and record these false positives on 
our side then.

Kamil

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-16  8:02               ` Kamil Dudka
@ 2021-04-16 18:56                 ` Paul Eggert
  2021-04-16 22:07                   ` Bruno Haible
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Eggert @ 2021-04-16 18:56 UTC (permalink / raw
  To: Kamil Dudka; +Cc: bug-gnulib, Bruno Haible

On 4/16/21 1:02 AM, Kamil Dudka wrote:
> We have to (re)verify the software that we distribute in the end.

Understandable.

> I am reading your responses as "upstream is not going to change anything".  We
> will have to find some ways to deduplicate and record these false positives on
> our side then.

Another possibility would be to libraryize Gnulib, scan that library 
once and record its false positives on your side once, and then change 
Gnulib-using packages to use that library instead of their in-source 
Gnulib copies. This would also be some work on your side, but it might 
fit better into your workflow.

One qualm I have with this idea, is that whole-program static analysis 
can do a better job than per-module static analysis. But you're already 
giving up on whole-program analysis with the other libraries, and adding 
one more library to the mix shouldn't hurt much.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-15 20:07           ` Kamil Dudka
  2021-04-15 20:30             ` Paul Eggert
@ 2021-04-16 22:01             ` Bruno Haible
  2021-04-17 14:21               ` Kamil Dudka
  1 sibling, 1 reply; 17+ messages in thread
From: Bruno Haible @ 2021-04-16 22:01 UTC (permalink / raw
  To: Kamil Dudka; +Cc: bug-gnulib

Kamil Dudka wrote:
> > Downstream consumers can exclude the gnulib-copied directories using the
> > 'csgrep' program, AFAIU?
> 
> Not so easily.  csgrep can filter the results by path in the source tree.
> The problem with gnulib is that different projects embed it in different 
> directories.  For example, coreutils has it in /lib whereas findutils has
> it in /gl/lib while /lib contains other source files that we do not want
> to exclude.  So we would have to maintain such exclusion lists per project.
> 
> People maintaining their own medium-size projects can easily play with this.  
> I am in a different situation when I need to scan 3700 distinct projects and 
> approx. 480 million lines of code with more or less the same manpower ;-)

These project-specific settings regarding gnulib are stored in a file named
'gnulib-cache.m4' by gnulib-tool.m4. Currently, few packages are storing this
file under version control or packaging it in tarballs. But we could change
this by documenting that it should be included in the tarballs, or by
modifying gnulib-tool slightly.

Are you working with git repository checkouts or with tarballs?

Bruno



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-16 18:56                 ` Paul Eggert
@ 2021-04-16 22:07                   ` Bruno Haible
  0 siblings, 0 replies; 17+ messages in thread
From: Bruno Haible @ 2021-04-16 22:07 UTC (permalink / raw
  To: Paul Eggert; +Cc: Kamil Dudka, bug-gnulib

Paul Eggert wrote:
> Another possibility would be to libraryize Gnulib, scan that library 
> once and record its false positives on your side once, and then change 
> Gnulib-using packages to use that library instead of their in-source 
> Gnulib copies. This would also be some work on your side, but it might 
> fit better into your workflow.

This is nontrivial work. We started some of it in 2010; the current status
is written up in the file 'STATUS-libposix'. We haven't worked on it
for 10 years. Part of the problem is that we have no clue how to address
the versioning of this library.

Bruno



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: tar + cpio - covscan issues
  2021-04-16 22:01             ` Bruno Haible
@ 2021-04-17 14:21               ` Kamil Dudka
  0 siblings, 0 replies; 17+ messages in thread
From: Kamil Dudka @ 2021-04-17 14:21 UTC (permalink / raw
  To: Bruno Haible; +Cc: bug-gnulib

On Saturday, April 17, 2021 12:01:56 AM CEST Bruno Haible wrote:
> Kamil Dudka wrote:
> > > Downstream consumers can exclude the gnulib-copied directories using the
> > > 'csgrep' program, AFAIU?
> > 
> > Not so easily.  csgrep can filter the results by path in the source tree.
> > The problem with gnulib is that different projects embed it in different
> > directories.  For example, coreutils has it in /lib whereas findutils has
> > it in /gl/lib while /lib contains other source files that we do not want
> > to exclude.  So we would have to maintain such exclusion lists per
> > project.
> > 
> > People maintaining their own medium-size projects can easily play with
> > this. I am in a different situation when I need to scan 3700 distinct
> > projects and approx. 480 million lines of code with more or less the same
> > manpower ;-)
> These project-specific settings regarding gnulib are stored in a file named
> 'gnulib-cache.m4' by gnulib-tool.m4. Currently, few packages are storing
> this file under version control or packaging it in tarballs. But we could
> change this by documenting that it should be included in the tarballs, or
> by modifying gnulib-tool slightly.
> 
> Are you working with git repository checkouts or with tarballs?
> 
> Bruno

The packages that I am scanning now are based on distribution tarballs but 
there is currently some effort to provide git-based workflow optionally for 
Fedora/RHEL packages where maintainers prefer it:

    https://packit.dev/docs/source-git/

I am not sure how much they take gnulib into account while developing this.

Kamil




^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-04-17 14:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-08 11:37 tar + cpio - covscan issues Ondrej Dubaj
2021-04-10 10:26 ` Bruno Haible
2021-04-10 13:34   ` Kamil Dudka
2021-04-10 13:58     ` Bruno Haible
2021-04-10 16:00       ` Kamil Dudka
2021-04-10 18:40         ` Bruno Haible
2021-04-11 19:50           ` Paul Eggert
2021-04-15 20:07           ` Kamil Dudka
2021-04-15 20:30             ` Paul Eggert
2021-04-16  8:02               ` Kamil Dudka
2021-04-16 18:56                 ` Paul Eggert
2021-04-16 22:07                   ` Bruno Haible
2021-04-16 22:01             ` Bruno Haible
2021-04-17 14:21               ` Kamil Dudka
2021-04-12  5:19   ` Ondrej Dubaj
2021-04-12  7:42     ` Paul Eggert
2021-04-12  7:47       ` Ondrej Dubaj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).