From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH] general style: replaces memcmp() with proper starts_with() Date: Wed, 12 Mar 2014 13:56:24 -0400 Message-ID: <20140312175624.GA7982@sigill.intra.peff.net> References: <1394635434-44979-1-git-send-email-quintus.public@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: git@vger.kernel.org To: Quint Guvernator X-From: git-owner@vger.kernel.org Wed Mar 12 18:56:41 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WNnON-0006AV-FO for gcvg-git-2@plane.gmane.org; Wed, 12 Mar 2014 18:56:36 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755644AbaCLR43 (ORCPT ); Wed, 12 Mar 2014 13:56:29 -0400 Received: from cloud.peff.net ([50.56.180.127]:38192 "HELO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1755446AbaCLR40 (ORCPT ); Wed, 12 Mar 2014 13:56:26 -0400 Received: (qmail 23932 invoked by uid 102); 12 Mar 2014 17:56:26 -0000 Received: from c-71-63-4-13.hsd1.va.comcast.net (HELO sigill.intra.peff.net) (71.63.4.13) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.84) with ESMTPA; Wed, 12 Mar 2014 12:56:26 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Wed, 12 Mar 2014 13:56:24 -0400 Content-Disposition: inline In-Reply-To: <1394635434-44979-1-git-send-email-quintus.public@gmail.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Wed, Mar 12, 2014 at 10:43:54AM -0400, Quint Guvernator wrote: > memcmp() is replaced with negated starts_with() when comparing strings > from the beginning. starts_with() looks nicer and it saves the extra > argument of the length of the comparing string. Thanks, I think this is a real readability improvement in most cases. One of the things to do when reviewing a patch like this is make sure that there aren't any silly arithmetic mistakes. Checking that can be tedious, so I thought about how _I_ would do it programatically, and then compared our results. I tried: perl -i -lpe ' s/memcmp\(([^,]+), "(.*?)", (\d+)\)/ length($2) == $3 ? qq{!starts_with($1, "$2")} : $& /ge ' $(git ls-files '*.c') That comes up with almost the same results as yours, but misses a few cases that you caught (in addition to the need to simplify !!starts_with()): - memcmp(foo, "bar", strlen("bar")) - strings with interpolation, like "foo\n", which is actually 4 characters in length, not 5. But there were only a few of those, and I hand-verified them. So I feel pretty good that the changes are all correct transformations. That leaves the question of whether they are all improvements in readability (more on that below). > note: this commit properly handles negation in starts_with() > > Signed-off-by: Quint Guvernator > --- This note should go after the "---" line so that it is not part of the commit message (it is only interesting to people seeing v2 and wondering what changed from v1 earlier on the list, not people reading the history much later). > diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c > index 987a4c3..2777519 100644 > --- a/builtin/commit-tree.c > +++ b/builtin/commit-tree.c > @@ -66,7 +66,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix) > continue; > } > > - if (!memcmp(arg, "-S", 2)) { > + if (starts_with(arg, "-S")) { > sign_commit = arg + 2; > continue; > } This is an improvement, but we still have the magic "2" below. Would skip_prefix be a better match here, like: if ((val = skip_prefix(arg, "-S"))) { sign_commit = val; continue; } We've also talked about refactoring skip_prefix to return a boolean, and put the value in an out-parameter. Which would make this even more readable: if (skip_prefix(arg, "-S", &sign_commit)) continue; Maybe the time has come to do that. > --- a/builtin/mailinfo.c > +++ b/builtin/mailinfo.c > @@ -626,7 +626,7 @@ static int handle_boundary(void) > strbuf_addch(&newline, '\n'); > again: > if (line.len >= (*content_top)->len + 2 && > - !memcmp(line.buf + (*content_top)->len, "--", 2)) { > + starts_with(line.buf + (*content_top)->len, "--")) { I'm not sure if this improves readability or not. It's certainly nice to get rid of the magic "2", but starts_with is a bit of a misnomer, since we are indexing deep into the buffer anyway. And we still have the magic "2" above anyway. I'm on the fence. > @@ -727,8 +727,8 @@ static int is_scissors_line(const struct strbuf *line) > continue; > } > if (i + 1 < len && > - (!memcmp(buf + i, ">8", 2) || !memcmp(buf + i, "8<", 2) || > - !memcmp(buf + i, ">%", 2) || !memcmp(buf + i, "%<", 2))) { > + (starts_with(buf + i, ">8") || starts_with(buf + i, "8<") || > + starts_with(buf + i, ">%") || starts_with(buf + i, "%<"))) { Same as above. > @@ -66,7 +66,7 @@ static int verify_tag(char *buffer, unsigned long size) > return error("char%"PRIuMAX": could not find next \"\\n\"", > (uintmax_t) (type_line - buffer)); > tag_line++; > - if (memcmp(tag_line, "tag ", 4) || tag_line[4] == '\n') > + if (!starts_with(tag_line, "tag ") || tag_line[4] == '\n') > return error("char%"PRIuMAX": no \"tag \" found", > (uintmax_t) (tag_line - buffer)); I think this is another that could benefit from an enhanced skip_prefix: if (!skip_prefix(tag_line, "tag ", &tag_name) || *tag_name == '\n') ... and then we can get rid of the "tag_line += 4" that is used much later (in fact, I suspect that whole function could be improved in this respect). > @@ -269,7 +269,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s > return 0; > item->object.parsed = 1; > tail += size; > - if (tail <= bufptr + 46 || memcmp(bufptr, "tree ", 5) || bufptr[45] != '\n') > + if (tail <= bufptr + 46 || !starts_with(bufptr, "tree ") || bufptr[45] != '\n') > return error("bogus commit object %s", sha1_to_hex(item->object.sha1)); > if (get_sha1_hex(bufptr + 5, parent) < 0) Again, we just use "bufptr + 5" a bit later here. So a candidate for skip_prefix. > graft = lookup_commit_graft(item->object.sha1); > - while (bufptr + 48 < tail && !memcmp(bufptr, "parent ", 7)) { > + while (bufptr + 48 < tail && starts_with(bufptr, "parent ")) { Ditto here. > static inline int standard_header_field(const char *field, size_t len) > { > - return ((len == 4 && !memcmp(field, "tree ", 5)) || > - (len == 6 && !memcmp(field, "parent ", 7)) || > - (len == 6 && !memcmp(field, "author ", 7)) || > - (len == 9 && !memcmp(field, "committer ", 10)) || > - (len == 8 && !memcmp(field, "encoding ", 9))); > + return ((len == 4 && starts_with(field, "tree ")) || > + (len == 6 && starts_with(field, "parent ")) || > + (len == 6 && starts_with(field, "author ")) || > + (len == 9 && starts_with(field, "committer ")) || > + (len == 8 && starts_with(field, "encoding "))); These extra "len" checks are interesting. They look like an attempt to optimize lookup, since the caller will already have scanned forward to the space. I wonder if that makes a measurable difference, or is simply premature optimization. > @@ -514,7 +514,7 @@ static int git_proxy_command_options(const char *var, const char *value, > if (0 <= matchlen) { > /* core.gitproxy = none for kernel.org */ > if (matchlen == 4 && > - !memcmp(value, "none", 4)) > + starts_with(value, "none")) While technically correct, I'm not sure starts_with really expresses what's going on here. We're matching a full string, but we cannot use strcmp because the buffer isn't terminated. I'd think a more readable alternative would be something like: int mem_equals(const char *buf, size_t len, const char *match) { int match_len = strlen(match); return len == match_len && !memcmp(buf, match, len); } ... if (mem_equals(value, matchlen, "none")) ... That might also apply to some other cases (e.g., the case above with standard_header_field). The name mem_equals might not be descriptive enough, but hopefully you get the idea. > --- a/credential-cache.c > +++ b/credential-cache.c > @@ -54,7 +54,7 @@ static void spawn_daemon(const char *socket) > r = read_in_full(daemon.out, buf, sizeof(buf)); > if (r < 0) > die_errno("unable to read result code from cache daemon"); > - if (r != 3 || memcmp(buf, "ok\n", 3)) > + if (r != 3 || !starts_with(buf, "ok\n")) > die("cache daemon did not start: %.*s", r, buf); Another mem_equals case? > close(daemon.out); > } > diff --git a/fetch-pack.c b/fetch-pack.c > index f061f1f..17823ab 100644 > --- a/fetch-pack.c > +++ b/fetch-pack.c > @@ -506,7 +506,7 @@ static void filter_refs(struct fetch_pack_args *args, > int keep = 0; > next = ref->next; > > - if (!memcmp(ref->name, "refs/", 5) && > + if (starts_with(ref->name, "refs/") && > check_refname_format(ref->name, 0)) > ; /* trash */ > else { > diff --git a/fsck.c b/fsck.c > index 99c0497..3bd69da 100644 > --- a/fsck.c > +++ b/fsck.c > @@ -290,12 +290,12 @@ static int fsck_commit(struct commit *commit, fsck_error error_func) > if (commit->date == ULONG_MAX) > return error_func(&commit->object, FSCK_ERROR, "invalid author/committer line"); > > - if (memcmp(buffer, "tree ", 5)) > + if (!starts_with(buffer, "tree ")) > return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line"); > if (get_sha1_hex(buffer+5, tree_sha1) || buffer[45] != '\n') Another skip_prefix case. > - while (!memcmp(buffer, "parent ", 7)) { > + while (starts_with(buffer, "parent ")) { > if (get_sha1_hex(buffer+7, sha1) || buffer[47] != '\n') Ditto. > - if (memcmp(buffer, "author ", 7)) > + if (!starts_with(buffer, "author ")) > return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'author' line"); > buffer += 7; Ditto. > - if (memcmp(buffer, "committer ", strlen("committer "))) > + if (!starts_with(buffer, "committer ")) > return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'committer' line"); > buffer += strlen("committer "); Ditto. > diff --git a/http-walker.c b/http-walker.c > index 1516c5e..9a8d412 100644 > --- a/http-walker.c > +++ b/http-walker.c > @@ -245,7 +245,7 @@ static void process_alternates_response(void *callback_data) > - base); > okay = 1; > } > - } else if (!memcmp(data + i, "../", 3)) { > + } else if (starts_with(data + i, "../")) { > /* > * Relative URL; chop the corresponding > * number of subpath from base (and ../ > @@ -267,7 +267,7 @@ static void process_alternates_response(void *callback_data) > i += 3; Another skip_prefix? > --- a/path.c > +++ b/path.c > @@ -26,7 +26,7 @@ static char *get_pathname(void) > static char *cleanup_path(char *path) > { > /* Clean it up */ > - if (!memcmp(path, "./", 2)) { > + if (starts_with(path, "./")) { > path += 2; Another skip_prefix. > @@ -63,7 +63,7 @@ static int check_refname_component(const char *refname, int flags) > if (refname[1] == '\0') > return -1; /* Component equals ".". */ > } > - if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5)) > + if (cp - refname >= 5 && starts_with(cp - 5, ".lock")) > return -1; /* Refname ends with ".lock". */ Should this actually become ends_with? I guess it needs refname to be NUL-terminated, though, which it is not. I don't know if it is worth having a mem_ends_with variant, like: if (mem_ends_with(refname, cp - refname, ".lock")) ... > diff --git a/submodule.c b/submodule.c > index b80ecac..3de9ae5 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -203,7 +203,7 @@ void gitmodules_config(void) > if (active_nr > pos) { /* there is a .gitmodules */ > const struct cache_entry *ce = active_cache[pos]; > if (ce_namelen(ce) == 11 && > - !memcmp(ce->name, ".gitmodules", 11)) > + starts_with(ce->name, ".gitmodules")) > gitmodules_is_unmerged = 1; Jens singled this one out earlier, and I agree. I think this is a "mem_equals". > diff --git a/transport.c b/transport.c > index ca7bb44..a267822 100644 > --- a/transport.c > +++ b/transport.c > @@ -1364,7 +1364,7 @@ static int refs_from_alternate_cb(struct alternate_object_database *e, > > while (other[len-1] == '/') > other[--len] = '\0'; > - if (len < 8 || memcmp(other + len - 8, "/objects", 8)) > + if (len < 8 || !starts_with(other + len - 8, "/objects")) > return 0; This is another ends_with candidate (maybe mem_ends_with? I didn't check). Phew. That's a lot of changes. The ones I didn't mention are all hands-down readability improvements, though I suspect some of them may also be skip_prefix candidates (I didn't investigate each one, but only mentioned the ones where I saw the obvious magic number in the diff context). I think we can take most of them as-is, though, and perhaps even some of the ones I mentioned (even if skip_prefix can be used, switching to starts_with may be an intermediate improvement). If you don't feel like going further on this topic, that's OK, but I think with a few more helpers we could really further clean up some of these callsites. -Peff