git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Alex Riesen <raa.lkml@gmail.com>,
	git@vger.kernel.org
Subject: [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list
Date: Tue, 19 Sep 2017 22:27:05 -0700	[thread overview]
Message-ID: <20170920052705.GC126984@aiede.mtv.corp.google.com> (raw)
In-Reply-To: <xmqqd16mowig.fsf@gitster.mtv.corp.google.com>

From: Michael Haggerty <mhagger@alum.mit.edu>

If you pass a newly initialized or newly cleared `string_list` to
`for_each_string_list_item()`, then the latter does

    for (
            item = (list)->items; /* NULL */
            item < (list)->items + (list)->nr; /* NULL + 0 */
            ++item)

Even though this probably works almost everywhere, it is undefined
behavior, and it could plausibly cause highly-optimizing compilers to
misbehave.  C99 section 6.5.6 paragraph 8 explains:

    If both the pointer operand and the result point to elements
    of the same array object, or one past the last element of the
    array object, the evaluation shall not produce an overflow;
    otherwise, the behavior is undefined.

and (6.3.2.3.3) a null pointer does not point to anything.

Guard the loop with a NULL check to make the intent crystal clear to
even the most pedantic compiler.  A suitably clever compiler could let
the NULL check only run in the first iteration, but regardless, this
overhead is likely to be dwarfed by the work to be done on each item.

This problem was noticed by Coverity.

[jn: using a NULL check instead of a placeholder empty list;
 fleshed out the commit message based on mailing list discussion]

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 string-list.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> ...  But a quick test with gcc 4.8.4
>> -O2 finds that at least this compiler does not contain such an
>> optimization.  The overhead Michael Haggerty mentioned is real.
>
> Still, I have a feeling that users of string_list wouldn't care
> the overhead of single pointer NULL-ness check.
>
>  - apply.c collects conflicted paths and reports them with fprintf().
>
>  - builtin/clean.c uses the function to walk the list of paths to be
>    removed, and either does a human interaction (for "-i" codepath)
>    or goes to the filesystem to remove things.
>
>  - builtin/config.c uses it in get_urlmatch() in preparation for
>    doing network-y things.
>
>  - builtin/describe.c walks the list of exclude and include patterns
>    to run wildmatch on the candidate reference name to filter it out.
>
>  ...
>
> In all of these examples, what happens for each item in the loop
> seems to me far heavier than the overhead this macro adds.

Yes, agreed.  As a small tweak,

   #define for_each_string_list_item(item, list) \
	for (item = ...; item && ...; ...)

produces nicer assembly than

   #define for_each_string_list_item(item, list) \
	for (item = ...; list->items && ...; ...)

(By the way, the potential optimization I described isn't valid: we
know that when item == NULL and list->items == NULL, list->nr is
always zero, but the compiler has no way to know that.  So it can't
eliminate the NULL test.  For comparison, a suitably smart compiler
should be able to eliminate a 'list->nr != 0 &&' guard if 'list'
doesn't escape in the loop body.)

Recapping the other proposed fixes:

A. Make it an invariant of string_list that items is never NULL and
   update string_list_init et al to use an empty array.  This is
   pretty painless until you notice some other structs that embed
   string_list without using STRING_LIST_INIT.  Updating all those
   would be too painful.

B. #define for_each_string_list_item(item, list) \
	if (list->items) \
		for (item = ...; ...; ... )

   This breaks a caller like
	if (foo)
		for_each_string_list_item(item, list)
			...
	else
		...

   making it a non-starter.

C. As Gábor suggested,
   #define for_each_string_list_item(item, list) \
   	if (!list->items) \
		; /* nothing to do */ \
	else \
		for (item = ...; ...; ...)

   This handles the caller from (B) correctly.  But it produces
   compiler warnings for a caller like

	if (foo)
		for_each_string_list_item(item, list)
			...

   There is only one instance of that construct in git today.  It
   looks nicer anyway with braces, so this approach would also be
   promising.

D. Eliminate for_each_string_list_item and let callers just do

	unsigned int i;
	for (i = 0; i < list->nr; i++) {
		struct string_list_item *item = list->items[i];
		...
	}

   Having to declare item is unnecessarily verbose, decreasing the
   appeal of this option.  I think I like it anyway, but I wasn't able
   to convince coccinelle to do it.

E. Use subtraction instead of addition:
   #define for_each_string_list_item(item, list) \
   	for (item = ...; \
	     (item == list->items ? 0 : item - list->items) < nr; \
	     item++)

   I expected the compiler to figure out that this is a long way of writing
   (item - list->items), but at least with gcc 4.8.4 -O2, no such
   luck.  This generates uglier assembly than the NULL check.

diff --git a/string-list.h b/string-list.h
index 29bfb7ae45..79ae567cbc 100644
--- a/string-list.h
+++ b/string-list.h
@@ -32,8 +32,10 @@ void string_list_clear_func(struct string_list *list, string_list_clear_func_t c
 typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
 int for_each_string_list(struct string_list *list,
 			 string_list_each_func_t, void *cb_data);
-#define for_each_string_list_item(item,list) \
-	for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
+#define for_each_string_list_item(item,list)            \
+	for (item = (list)->items;                      \
+	     item && item < (list)->items + (list)->nr; \
+	     ++item)
 
 /*
  * Apply want to each item in list, retaining only the ones for which
-- 
2.14.1.821.g8fa685d3b7


  reply	other threads:[~2017-09-20  5:27 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15 16:00 [PATCH] for_each_string_list_item(): behave correctly for empty list Michael Haggerty
2017-09-15 18:43 ` Jonathan Nieder
2017-09-16  4:06   ` Michael Haggerty
2017-09-16 11:51     ` SZEDER Gábor
2017-09-17 10:19       ` Michael Haggerty
2017-09-19 14:38     ` Kaartic Sivaraam
2017-09-20  1:38       ` Junio C Hamano
2017-09-20  1:43         ` Jonathan Nieder
2017-09-20  5:14           ` Junio C Hamano
2017-09-20  2:30       ` Jonathan Nieder
2017-09-20  3:54         ` Junio C Hamano
2017-09-20  5:27           ` Jonathan Nieder [this message]
2017-09-20  5:40             ` [PATCH v2] for_each_string_list_item: avoid undefined behavior " Junio C Hamano
2017-09-20  7:00             ` Michael Haggerty
2017-09-20  7:40             ` Kaartic Sivaraam
2017-09-20 12:22             ` [PATCH v2] doc: camelCase the config variables to improve readability Kaartic Sivaraam
2017-09-20 16:28             ` [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list Andreas Schwab
2017-09-20 17:31               ` Jonathan Nieder
2017-09-20 21:51                 ` Andreas Schwab
2017-09-21  1:12                   ` Junio C Hamano
2017-09-21 15:39                     ` Andreas Schwab
2017-09-20  7:35         ` [PATCH] for_each_string_list_item(): behave correctly " Kaartic Sivaraam
2017-09-17  0:59 ` Junio C Hamano
2017-09-17 10:24   ` Michael Haggerty
2017-09-18  0:37     ` Junio C Hamano
2017-09-19  0:08       ` Stefan Beller
2017-09-19  6:51         ` Michael Haggerty
2017-09-19 13:38           ` SZEDER Gábor
2017-09-19 13:45             ` SZEDER Gábor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170920052705.GC126984@aiede.mtv.corp.google.com \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kaarticsivaraam91196@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=raa.lkml@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).