git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] some minor starts_with()/skip_prefix() cleanups
@ 2023-01-07 13:24 Jeff King
  2023-01-07 13:26 ` [PATCH 1/2] convert trivial uses of strncmp() to starts_with() Jeff King
  2023-01-07 13:26 ` [PATCH 2/2] convert trivial uses of strncmp() to skip_prefix() Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff King @ 2023-01-07 13:24 UTC (permalink / raw)
  To: git

I happened to notice a recently-introduced strncmp() that could have
been a starts_with(). So I grepped and found a few more, and this is the
result.

  [1/2]: convert trivial uses of strncmp() to starts_with()
  [2/2]: convert trivial uses of strncmp() to skip_prefix()

 builtin/remote-ext.c | 6 ++++--
 builtin/remote-fd.c  | 2 +-
 bundle-uri.c         | 2 +-
 ref-filter.c         | 2 +-
 urlmatch.c           | 6 +++---
 ws.c                 | 7 ++++---
 6 files changed, 14 insertions(+), 11 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] convert trivial uses of strncmp() to starts_with()
  2023-01-07 13:24 [PATCH 0/2] some minor starts_with()/skip_prefix() cleanups Jeff King
@ 2023-01-07 13:26 ` Jeff King
  2023-01-07 13:26 ` [PATCH 2/2] convert trivial uses of strncmp() to skip_prefix() Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff King @ 2023-01-07 13:26 UTC (permalink / raw)
  To: git

It's more readable to use starts_with() instead of strncmp() to match a
prefix, as the latter requires a manually-computed length, and has the
funny "matching is zero" return value common to cmp functions.  This
patch converts several cases which were found with:

  git grep 'strncmp(.*, [0-9]*)'

But note that it doesn't convert all such cases. There are several where
the magic length number is repeated elsewhere in the code, like:

  /* handle "buf" which isn't NUL-terminated and might be too small */
  if (len >= 3 && !strncmp(buf, "foo", 3))

or:

  /* exact match for "foo", but within a larger string */
  if (end - buf == 3 && !strncmp(buf, "foo", 3))

While it would not produce the wrong outcome to use starts_with() in
these cases, we'd still be left with one instance of "3". We're better
to leave them for now, as the repeated "3" makes it clear that the two
are linked (there may be other refactorings that handle both, but
they're out of scope for this patch).

A few things to note while reading the patch:

  - all cases but one are trying to match, and so lose the extra "!".
    The case in the first hunk of urlmatch.c is not-matching, and hence
    gains a "!".

  - the case in remote-fd.c is matching the beginning of "connect foo",
    but we never look at str+8 to parse the "foo" part (which would make
    this a candidate for skip_prefix(), not starts_with()). This seems
    at first glance like a bug, but is a limitation of how remote-fd
    works.

  - the second hunk in urlmatch.c shows some cases adjacent to other
    strncmp() calls that are left. These are of the "exact match within
    a larger string" type, as described above.

Signed-off-by: Jeff King <peff@peff.net>
---
No coccinelle rule for the reason given above that we don't want to just
convert all cases.

I do think a starts_with_mem() could help some of those, but I left that
for another day.

 builtin/remote-fd.c | 2 +-
 bundle-uri.c        | 2 +-
 ref-filter.c        | 2 +-
 urlmatch.c          | 6 +++---
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/remote-fd.c b/builtin/remote-fd.c
index 91dfe07e06..b2a3980b1d 100644
--- a/builtin/remote-fd.c
+++ b/builtin/remote-fd.c
@@ -40,7 +40,7 @@ static void command_loop(int input_fd, int output_fd)
 		if (!strcmp(buffer, "capabilities")) {
 			printf("*connect\n\n");
 			fflush(stdout);
-		} else if (!strncmp(buffer, "connect ", 8)) {
+		} else if (starts_with(buffer, "connect ")) {
 			printf("\n");
 			fflush(stdout);
 			if (bidirectional_transfer_loop(input_fd,
diff --git a/bundle-uri.c b/bundle-uri.c
index 36268dda17..6462ab6deb 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -620,7 +620,7 @@ static int config_to_packet_line(const char *key, const char *value, void *data)
 {
 	struct packet_reader *writer = data;
 
-	if (!strncmp(key, "bundle.", 7))
+	if (starts_with(key, "bundle."))
 		packet_write_fmt(writer->fd, "%s=%s", key, value);
 
 	return 0;
diff --git a/ref-filter.c b/ref-filter.c
index a24324123e..f8203c6b05 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1209,7 +1209,7 @@ static const char *copy_name(const char *buf)
 {
 	const char *cp;
 	for (cp = buf; *cp && *cp != '\n'; cp++) {
-		if (!strncmp(cp, " <", 2))
+		if (starts_with(cp, " <"))
 			return xmemdupz(buf, cp - buf);
 	}
 	return xstrdup("");
diff --git a/urlmatch.c b/urlmatch.c
index b615adc923..620a648efc 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -209,7 +209,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 	 */
 	if (!url_len || strchr(":/?#", *url)) {
 		/* Missing host invalid for all URL schemes except file */
-		if (strncmp(norm.buf, "file:", 5)) {
+		if (!starts_with(norm.buf, "file:")) {
 			if (out_info) {
 				out_info->url = NULL;
 				out_info->err = _("missing host and scheme is not 'file:'");
@@ -268,11 +268,11 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 		if (url == slash_ptr) {
 			/* Skip ":" port with no number, it's same as default */
 		} else if (slash_ptr - url == 2 &&
-			   !strncmp(norm.buf, "http:", 5) &&
+			   starts_with(norm.buf, "http:") &&
 			   !strncmp(url, "80", 2)) {
 			/* Skip http :80 as it's the default */
 		} else if (slash_ptr - url == 3 &&
-			   !strncmp(norm.buf, "https:", 6) &&
+			   starts_with(norm.buf, "https:") &&
 			   !strncmp(url, "443", 3)) {
 			/* Skip https :443 as it's the default */
 		} else {
-- 
2.39.0.469.g9000b9c396


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] convert trivial uses of strncmp() to skip_prefix()
  2023-01-07 13:24 [PATCH 0/2] some minor starts_with()/skip_prefix() cleanups Jeff King
  2023-01-07 13:26 ` [PATCH 1/2] convert trivial uses of strncmp() to starts_with() Jeff King
@ 2023-01-07 13:26 ` Jeff King
  2023-01-07 13:33   ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff King @ 2023-01-07 13:26 UTC (permalink / raw)
  To: git

As with the previous patch, using skip_prefix() is more readable and
less error-prone than a raw strncmp(), because it avoids a
manually-computed length. These cases differ from the previous patch
that uses starts_with() because they care about the value after the
matched prefix.

We can convert these to use skip_prefix() by introducing an extra
variable to hold the out-pointer.

Note in the case in ws.c that to get rid of the magic number "9"
completely, we also switch out "len" for recomputing the pointer
difference. These are equivalent because "len" is always "ep - string".

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/remote-ext.c | 6 ++++--
 ws.c                 | 7 ++++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/builtin/remote-ext.c b/builtin/remote-ext.c
index fd3538d4f0..ee338bf440 100644
--- a/builtin/remote-ext.c
+++ b/builtin/remote-ext.c
@@ -169,6 +169,8 @@ static int command_loop(const char *child)
 
 	while (1) {
 		size_t i;
+		const char *arg;
+
 		if (!fgets(buffer, MAXCOMMAND - 1, stdin)) {
 			if (ferror(stdin))
 				die("Command input error");
@@ -182,10 +184,10 @@ static int command_loop(const char *child)
 		if (!strcmp(buffer, "capabilities")) {
 			printf("*connect\n\n");
 			fflush(stdout);
-		} else if (!strncmp(buffer, "connect ", 8)) {
+		} else if (skip_prefix(buffer, "connect ", &arg)) {
 			printf("\n");
 			fflush(stdout);
-			return run_child(child, buffer + 8);
+			return run_child(child, arg);
 		} else {
 			fprintf(stderr, "Bad command");
 			return 1;
diff --git a/ws.c b/ws.c
index 46a77bcad6..903bfcd53e 100644
--- a/ws.c
+++ b/ws.c
@@ -29,6 +29,7 @@ unsigned parse_whitespace_rule(const char *string)
 		int i;
 		size_t len;
 		const char *ep;
+		const char *arg;
 		int negated = 0;
 
 		string = string + strspn(string, ", \t\n\r");
@@ -52,15 +53,15 @@ unsigned parse_whitespace_rule(const char *string)
 				rule |= whitespace_rule_names[i].rule_bits;
 			break;
 		}
-		if (strncmp(string, "tabwidth=", 9) == 0) {
-			unsigned tabwidth = atoi(string + 9);
+		if (skip_prefix(string, "tabwidth=", &arg)) {
+			unsigned tabwidth = atoi(arg);
 			if (0 < tabwidth && tabwidth < 0100) {
 				rule &= ~WS_TAB_WIDTH_MASK;
 				rule |= tabwidth;
 			}
 			else
 				warning("tabwidth %.*s out of range",
-					(int)(len - 9), string + 9);
+					(int)(ep - arg), arg);
 		}
 		string = ep;
 	}
-- 
2.39.0.469.g9000b9c396

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] convert trivial uses of strncmp() to skip_prefix()
  2023-01-07 13:26 ` [PATCH 2/2] convert trivial uses of strncmp() to skip_prefix() Jeff King
@ 2023-01-07 13:33   ` Jeff King
  2023-01-07 21:29     ` René Scharfe
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2023-01-07 13:33 UTC (permalink / raw)
  To: git

On Sat, Jan 07, 2023 at 08:26:45AM -0500, Jeff King wrote:

> Note in the case in ws.c that to get rid of the magic number "9"
> completely, we also switch out "len" for recomputing the pointer
> difference. These are equivalent because "len" is always "ep - string".

By the way, one thing I noticed about this parse_whitespace_rule()
function is that it's very loose about its matching. It does:

        for (i = 0; i < ARRAY_SIZE(whitespace_rule_names); i++) {
                if (strncmp(whitespace_rule_names[i].rule_name,
                            string, len))
                        continue;
		...we matched...
		break;
	}

So it will prefix-match any of the options, even if there are
ambiguities. E.g.:

  git -c core.whitespace=-t show

will turn off "trailing-space", even though it would also match
"tab-in-indent". It would be easy enough to fix it to require the whole
name, but I wasn't sure if this prefix-matching was supposed to be a
feature (it doesn't seem to be documented anywhere, though).

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] convert trivial uses of strncmp() to skip_prefix()
  2023-01-07 13:33   ` Jeff King
@ 2023-01-07 21:29     ` René Scharfe
  2023-01-11 18:38       ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: René Scharfe @ 2023-01-07 21:29 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Junio C Hamano

Am 07.01.23 um 14:33 schrieb Jeff King:
> On Sat, Jan 07, 2023 at 08:26:45AM -0500, Jeff King wrote:
>
>> Note in the case in ws.c that to get rid of the magic number "9"
>> completely, we also switch out "len" for recomputing the pointer
>> difference. These are equivalent because "len" is always "ep - string".
>
> By the way, one thing I noticed about this parse_whitespace_rule()
> function is that it's very loose about its matching. It does:
>
>         for (i = 0; i < ARRAY_SIZE(whitespace_rule_names); i++) {
>                 if (strncmp(whitespace_rule_names[i].rule_name,
>                             string, len))
>                         continue;
> 		...we matched...
> 		break;
> 	}
>
> So it will prefix-match any of the options, even if there are
> ambiguities. E.g.:
>
>   git -c core.whitespace=-t show
>
> will turn off "trailing-space", even though it would also match
> "tab-in-indent". It would be easy enough to fix it to require the whole
> name, but I wasn't sure if this prefix-matching was supposed to be a
> feature (it doesn't seem to be documented anywhere, though).

Abbreviations are being used:

   $ git grep whitespace= .gitattributes
   .gitattributes:* whitespace=!indent,trail,space
   .gitattributes:*.[ch] whitespace=indent,trail,space diff=cpp
   .gitattributes:*.sh whitespace=indent,trail,space eol=lf

(Full names: trailing-space, space-before-tab, indent-with-non-tab.)

a9cc857ada (War on whitespace: first, a bit of retreat., 2007-11-02)
added this function.  Its commit message says:

   "You can specify the desired types of errors to be detected by
    listing their names (unique abbreviations are accepted)
    separated by comma."

And the part about abbreviations needing to be unique was true at that
time, because the names of the only two types back then started with
different characters.  aeb84b05ae (core.whitespace: split trailing-space
into blank-at-{eol,eof}, 2009-09-05) broke that property first.

The current behavior of picking the first match from a list with a
basically random order is not ideal, but at least stable.  Can we still
change it?  The intent seems to have been to ignore ambiguous prefix
matches.

René


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] convert trivial uses of strncmp() to skip_prefix()
  2023-01-07 21:29     ` René Scharfe
@ 2023-01-11 18:38       ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2023-01-11 18:38 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Junio C Hamano

On Sat, Jan 07, 2023 at 10:29:28PM +0100, René Scharfe wrote:

> > So it will prefix-match any of the options, even if there are
> > ambiguities. E.g.:
> >
> >   git -c core.whitespace=-t show
> >
> > will turn off "trailing-space", even though it would also match
> > "tab-in-indent". It would be easy enough to fix it to require the whole
> > name, but I wasn't sure if this prefix-matching was supposed to be a
> > feature (it doesn't seem to be documented anywhere, though).
> 
> Abbreviations are being used:
> 
>    $ git grep whitespace= .gitattributes
>    .gitattributes:* whitespace=!indent,trail,space
>    .gitattributes:*.[ch] whitespace=indent,trail,space diff=cpp
>    .gitattributes:*.sh whitespace=indent,trail,space eol=lf
> 
> (Full names: trailing-space, space-before-tab, indent-with-non-tab.)

Ah, right, I should have checked to see if _we_ are using them before
guessing whether anyone else might be.

> a9cc857ada (War on whitespace: first, a bit of retreat., 2007-11-02)
> added this function.  Its commit message says:
> 
>    "You can specify the desired types of errors to be detected by
>     listing their names (unique abbreviations are accepted)
>     separated by comma."

Thanks, I dug around for something like that but somehow missed it.

So yeah, we definitely want to keep this abbreviation feature working.
The only question is whether we ought to detect ambiguous ones. I think
something like this would work, though I wonder if is even worth
bothering about. I did not even see this in the wild, but it was just a
curiosity while I was adjusting something else in the function:

diff --git a/ws.c b/ws.c
index 46a77bcad6..f4efd66209 100644
--- a/ws.c
+++ b/ws.c
@@ -29,6 +29,7 @@ unsigned parse_whitespace_rule(const char *string)
 		int i;
 		size_t len;
 		const char *ep;
+		struct whitespace_rule *matched = NULL;
 		int negated = 0;
 
 		string = string + strspn(string, ", \t\n\r");
@@ -43,15 +44,27 @@ unsigned parse_whitespace_rule(const char *string)
 		if (!len)
 			break;
 		for (i = 0; i < ARRAY_SIZE(whitespace_rule_names); i++) {
-			if (strncmp(whitespace_rule_names[i].rule_name,
-				    string, len))
+			struct whitespace_rule *cur = &whitespace_rule_names[i];
+			if (strncmp(cur->rule_name, string, len))
 				continue;
+			if (matched) {
+				warning("ignoring ambiguous whitespace rule '%.*s'"
+					" (matches '%s' and '%s')",
+					(int)len, string,
+					matched->rule_name, cur->rule_name);
+				matched = NULL;
+				break;
+			}
+			matched = cur;
+		}
+
+		if (matched) {
 			if (negated)
-				rule &= ~whitespace_rule_names[i].rule_bits;
+				rule &= ~matched->rule_bits;
 			else
-				rule |= whitespace_rule_names[i].rule_bits;
-			break;
+				rule |= matched->rule_bits;
 		}
+
 		if (strncmp(string, "tabwidth=", 9) == 0) {
 			unsigned tabwidth = atoi(string + 9);
 			if (0 < tabwidth && tabwidth < 0100) {

-Peff

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-11 18:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-07 13:24 [PATCH 0/2] some minor starts_with()/skip_prefix() cleanups Jeff King
2023-01-07 13:26 ` [PATCH 1/2] convert trivial uses of strncmp() to starts_with() Jeff King
2023-01-07 13:26 ` [PATCH 2/2] convert trivial uses of strncmp() to skip_prefix() Jeff King
2023-01-07 13:33   ` Jeff King
2023-01-07 21:29     ` René Scharfe
2023-01-11 18:38       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).