From: Jonathan Nieder <jrnieder@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com>,
Michael Haggerty <mhagger@alum.mit.edu>,
Alex Riesen <raa.lkml@gmail.com>,
git@vger.kernel.org
Subject: [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list
Date: Tue, 19 Sep 2017 22:27:05 -0700 [thread overview]
Message-ID: <20170920052705.GC126984@aiede.mtv.corp.google.com> (raw)
In-Reply-To: <xmqqd16mowig.fsf@gitster.mtv.corp.google.com>
From: Michael Haggerty <mhagger@alum.mit.edu>
If you pass a newly initialized or newly cleared `string_list` to
`for_each_string_list_item()`, then the latter does
for (
item = (list)->items; /* NULL */
item < (list)->items + (list)->nr; /* NULL + 0 */
++item)
Even though this probably works almost everywhere, it is undefined
behavior, and it could plausibly cause highly-optimizing compilers to
misbehave. C99 section 6.5.6 paragraph 8 explains:
If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined.
and (6.3.2.3.3) a null pointer does not point to anything.
Guard the loop with a NULL check to make the intent crystal clear to
even the most pedantic compiler. A suitably clever compiler could let
the NULL check only run in the first iteration, but regardless, this
overhead is likely to be dwarfed by the work to be done on each item.
This problem was noticed by Coverity.
[jn: using a NULL check instead of a placeholder empty list;
fleshed out the commit message based on mailing list discussion]
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
string-list.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
>> ... But a quick test with gcc 4.8.4
>> -O2 finds that at least this compiler does not contain such an
>> optimization. The overhead Michael Haggerty mentioned is real.
>
> Still, I have a feeling that users of string_list wouldn't care
> the overhead of single pointer NULL-ness check.
>
> - apply.c collects conflicted paths and reports them with fprintf().
>
> - builtin/clean.c uses the function to walk the list of paths to be
> removed, and either does a human interaction (for "-i" codepath)
> or goes to the filesystem to remove things.
>
> - builtin/config.c uses it in get_urlmatch() in preparation for
> doing network-y things.
>
> - builtin/describe.c walks the list of exclude and include patterns
> to run wildmatch on the candidate reference name to filter it out.
>
> ...
>
> In all of these examples, what happens for each item in the loop
> seems to me far heavier than the overhead this macro adds.
Yes, agreed. As a small tweak,
#define for_each_string_list_item(item, list) \
for (item = ...; item && ...; ...)
produces nicer assembly than
#define for_each_string_list_item(item, list) \
for (item = ...; list->items && ...; ...)
(By the way, the potential optimization I described isn't valid: we
know that when item == NULL and list->items == NULL, list->nr is
always zero, but the compiler has no way to know that. So it can't
eliminate the NULL test. For comparison, a suitably smart compiler
should be able to eliminate a 'list->nr != 0 &&' guard if 'list'
doesn't escape in the loop body.)
Recapping the other proposed fixes:
A. Make it an invariant of string_list that items is never NULL and
update string_list_init et al to use an empty array. This is
pretty painless until you notice some other structs that embed
string_list without using STRING_LIST_INIT. Updating all those
would be too painful.
B. #define for_each_string_list_item(item, list) \
if (list->items) \
for (item = ...; ...; ... )
This breaks a caller like
if (foo)
for_each_string_list_item(item, list)
...
else
...
making it a non-starter.
C. As Gábor suggested,
#define for_each_string_list_item(item, list) \
if (!list->items) \
; /* nothing to do */ \
else \
for (item = ...; ...; ...)
This handles the caller from (B) correctly. But it produces
compiler warnings for a caller like
if (foo)
for_each_string_list_item(item, list)
...
There is only one instance of that construct in git today. It
looks nicer anyway with braces, so this approach would also be
promising.
D. Eliminate for_each_string_list_item and let callers just do
unsigned int i;
for (i = 0; i < list->nr; i++) {
struct string_list_item *item = list->items[i];
...
}
Having to declare item is unnecessarily verbose, decreasing the
appeal of this option. I think I like it anyway, but I wasn't able
to convince coccinelle to do it.
E. Use subtraction instead of addition:
#define for_each_string_list_item(item, list) \
for (item = ...; \
(item == list->items ? 0 : item - list->items) < nr; \
item++)
I expected the compiler to figure out that this is a long way of writing
(item - list->items), but at least with gcc 4.8.4 -O2, no such
luck. This generates uglier assembly than the NULL check.
diff --git a/string-list.h b/string-list.h
index 29bfb7ae45..79ae567cbc 100644
--- a/string-list.h
+++ b/string-list.h
@@ -32,8 +32,10 @@ void string_list_clear_func(struct string_list *list, string_list_clear_func_t c
typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
int for_each_string_list(struct string_list *list,
string_list_each_func_t, void *cb_data);
-#define for_each_string_list_item(item,list) \
- for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
+#define for_each_string_list_item(item,list) \
+ for (item = (list)->items; \
+ item && item < (list)->items + (list)->nr; \
+ ++item)
/*
* Apply want to each item in list, retaining only the ones for which
--
2.14.1.821.g8fa685d3b7
next prev parent reply other threads:[~2017-09-20 5:27 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-15 16:00 [PATCH] for_each_string_list_item(): behave correctly for empty list Michael Haggerty
2017-09-15 18:43 ` Jonathan Nieder
2017-09-16 4:06 ` Michael Haggerty
2017-09-16 11:51 ` SZEDER Gábor
2017-09-17 10:19 ` Michael Haggerty
2017-09-19 14:38 ` Kaartic Sivaraam
2017-09-20 1:38 ` Junio C Hamano
2017-09-20 1:43 ` Jonathan Nieder
2017-09-20 5:14 ` Junio C Hamano
2017-09-20 2:30 ` Jonathan Nieder
2017-09-20 3:54 ` Junio C Hamano
2017-09-20 5:27 ` Jonathan Nieder [this message]
2017-09-20 5:40 ` [PATCH v2] for_each_string_list_item: avoid undefined behavior " Junio C Hamano
2017-09-20 7:00 ` Michael Haggerty
2017-09-20 7:40 ` Kaartic Sivaraam
2017-09-20 12:22 ` [PATCH v2] doc: camelCase the config variables to improve readability Kaartic Sivaraam
2017-09-20 16:28 ` [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list Andreas Schwab
2017-09-20 17:31 ` Jonathan Nieder
2017-09-20 21:51 ` Andreas Schwab
2017-09-21 1:12 ` Junio C Hamano
2017-09-21 15:39 ` Andreas Schwab
2017-09-20 7:35 ` [PATCH] for_each_string_list_item(): behave correctly " Kaartic Sivaraam
2017-09-17 0:59 ` Junio C Hamano
2017-09-17 10:24 ` Michael Haggerty
2017-09-18 0:37 ` Junio C Hamano
2017-09-19 0:08 ` Stefan Beller
2017-09-19 6:51 ` Michael Haggerty
2017-09-19 13:38 ` SZEDER Gábor
2017-09-19 13:45 ` SZEDER Gábor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170920052705.GC126984@aiede.mtv.corp.google.com \
--to=jrnieder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kaarticsivaraam91196@gmail.com \
--cc=mhagger@alum.mit.edu \
--cc=raa.lkml@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).