git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Quint Guvernator <quintus.public@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH] general style: replaces memcmp() with proper starts_with()
Date: Wed, 12 Mar 2014 15:37:17 -0700	[thread overview]
Message-ID: <xmqqha73jb6q.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140312175624.GA7982@sigill.intra.peff.net> (Jeff King's message of "Wed, 12 Mar 2014 13:56:24 -0400")

Jeff King <peff@peff.net> writes:

> Thanks, I think this is a real readability improvement in most cases.
> ...
>
> I tried:
>
>   perl -i -lpe '
>     s/memcmp\(([^,]+), "(.*?)", (\d+)\)/
>      length($2) == $3 ?
>        qq{!starts_with($1, "$2")} :
>        $&
>     /ge
>   ' $(git ls-files '*.c')
>
> That comes up with almost the same results as yours, but misses a few
> cases that you caught (in addition to the need to simplify
> !!starts_with()):
>
>   - memcmp(foo, "bar", strlen("bar"))
>
>   - strings with interpolation, like "foo\n", which is actually 4
>     characters in length, not 5.
>
> But there were only a few of those, and I hand-verified them. So I feel
> pretty good that the changes are all correct transformations.
>
> That leaves the question of whether they are all improvements in
> readability (more on that below).

After reading the patch and the result of your Perl replacement, I'd
agree with the "correctness" but I am not as convinced as you seem
to be about the "real readability improvement in most cases."  "some
cases", perhaps, though.

Taking two random examples from an early and a late parts of the
patch:

--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -82,7 +82,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 				enum object_type type;
 				unsigned long size;
 				char *buffer = read_sha1_file(sha1, &type, &size);
-				if (memcmp(buffer, "object ", 7) ||
+				if (!starts_with(buffer, "object ") ||
 				    get_sha1_hex(buffer + 7, blob_sha1))
 					die("%s not a valid tag", sha1_to_hex(sha1));
 				free(buffer);

diff --git a/transport.c b/transport.c
index ca7bb44..a267822 100644
--- a/transport.c
+++ b/transport.c
@@ -1364,7 +1364,7 @@ static int refs_from_alternate_cb(struct alternate_object_database *e,
 
 	while (other[len-1] == '/')
 		other[--len] = '\0';
-	if (len < 8 || memcmp(other + len - 8, "/objects", 8))
+	if (len < 8 || !starts_with(other + len - 8, "/objects"))
 		return 0;
 	/* Is this a git repository with refs? */
 	memcpy(other + len - 8, "/refs", 6);


The original hunks show that the code knows and relies on magic
numbers 7 and 8 very clearly and there are rooms for improvement.
The result after the conversion, however, still have the same magic
numbers, but one less of them each.  Doesn't it make it harder to
later spot the patterns to come up with a better abstraction that
does not rely on the magic number?  Especially in the first hunk, we
can spot the repeated 7s in the original that make it glaringly
clear that we might want a better abstraction there, but in the text
after the patch, there is less clue that encourages us to take a
look at that "buffer + 7" further to make sure we do not feed a
wrong string to get_sha1_hex() by mistake when we update the
surrounding code or the data format this codepath parses.

I think grepping for "memcmp()" that compares the same number of
bytes as a constant string, like you showed in that Perl scriptlet,
is a very good first step to locate where we might want to look for
improvements.  I however think that a mechanical replacement of all
such memcmp() with !start_with() is of a dubious value.

After finding the hunk in the cat-file.c shown above, and after
seeing many other similar patterns, one may realize that it is a
good use case for a variant of skip_prefix() that returns boolean,
which we discussed earlier, perhaps like so:

	if (!skip_over(buffer, "object ", &object_name) ||
            get_sha1_hex(object_name, blob_sha1))
		die(...);

and such a solution would be what truly eradicates the reliance of
magic constants that might break by miscounting bytes in string
constants.  I do not think mechanical replacement to !start_with()
is "going in the right direction and reaching a halfway to that
goal".  I honestly think that it instead is taking us into a wrong
direction, without really solving the use of brittle magic constants
and making remaining reliance of them even harder to spot.

So....

  parent reply	other threads:[~2014-03-12 22:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-12 14:43 [PATCH] general style: replaces memcmp() with proper starts_with() Quint Guvernator
2014-03-12 17:56 ` Jeff King
2014-03-12 19:39   ` Junio C Hamano
2014-03-12 19:49     ` Jeff King
2014-03-12 20:08       ` Junio C Hamano
2014-03-12 21:14         ` Jeff King
2014-03-12 21:39           ` Jeff King
2014-03-12 22:06           ` Jeff King
2014-03-12 22:38             ` Junio C Hamano
2014-03-13  3:33               ` Quint Guvernator
2014-03-13 17:46                 ` Junio C Hamano
2014-03-14  4:57                 ` Jeff King
2014-03-14 14:51                   ` Quint Guvernator
2014-03-14 16:56                     ` Junio C Hamano
2014-03-12 20:51     ` René Scharfe
2014-03-12 21:16       ` David Kastrup
2014-03-12 21:45         ` René Scharfe
2014-03-12 20:52     ` David Kastrup
2014-03-12 22:37   ` Junio C Hamano [this message]
2014-03-13  6:27     ` David Kastrup
2014-03-13 17:47       ` Junio C Hamano
2014-03-13 17:55         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqha73jb6q.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=quintus.public@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).