git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Clean up wildmatch.c
@ 2023-02-10  7:59 Masahiro Yamada
  2023-02-10  7:59 ` [PATCH 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-10  7:59 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

This file was imported from rsysc, but it was already largely modified
for GIT. Why not further cleanups? I see a lot of unneeded stubs.



Masahiro Yamada (5):
  git-compat-util: add isblank() and isgraph()
  wildmatch: remove IS*() macros
  wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros
  wildmatch: use char instead of uchar
  wildmatch: more cleanups after killing uchar

 git-compat-util.h |   4 ++
 wildmatch.c       | 108 +++++++++++++---------------------------------
 2 files changed, 35 insertions(+), 77 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-10  7:59 [PATCH 0/5] Clean up wildmatch.c Masahiro Yamada
@ 2023-02-10  7:59 ` Masahiro Yamada
  2023-02-10 13:16   ` Ævar Arnfjörð Bjarmason
                     ` (2 more replies)
  2023-02-10  7:59 ` [PATCH 2/5] wildmatch: remove IS*() macros Masahiro Yamada
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-10  7:59 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

git-compat-util.h implements most of is*() macros.

Add isblank() and isgraph(), which are useful to clean up wildmatch.c
in a consistent way (in this and later commits).

Use them with care because they are not robust against the pointer
increment, like isblank(*s++).

The same issue already exists for isspace().

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

 git-compat-util.h |  4 ++++
 wildmatch.c       | 14 ++------------
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 4f0028ce60..90b43b2bc9 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1212,10 +1212,12 @@ extern const unsigned char tolower_trans_tbl[256];
 /* Sane ctype - no locale, and works with signed chars */
 #undef isascii
 #undef isspace
+#undef isblank
 #undef isdigit
 #undef isalpha
 #undef isalnum
 #undef isprint
+#undef isgraph
 #undef islower
 #undef isupper
 #undef tolower
@@ -1236,10 +1238,12 @@ extern const unsigned char sane_ctype[256];
 #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
 #define isascii(x) (((x) & ~0x7f) == 0)
 #define isspace(x) sane_istest(x,GIT_SPACE)
+#define isblank(x) (isspace(x) || (x) == '\t')
 #define isdigit(x) sane_istest(x,GIT_DIGIT)
 #define isalpha(x) sane_istest(x,GIT_ALPHA)
 #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
 #define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)
+#define isgraph(x) (isprint(x) && !isspace(x))
 #define islower(x) sane_iscase(x, 1)
 #define isupper(x) sane_iscase(x, 0)
 #define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL)
diff --git a/wildmatch.c b/wildmatch.c
index 7e5a7ea1ea..85c4c7f8a7 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -28,18 +28,8 @@ typedef unsigned char uchar;
 # define ISASCII(c) isascii(c)
 #endif
 
-#ifdef isblank
-# define ISBLANK(c) (ISASCII(c) && isblank(c))
-#else
-# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
-#endif
-
-#ifdef isgraph
-# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
-#else
-# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
-#endif
-
+#define ISBLANK(c) (ISASCII(c) && isblank(c))
+#define ISGRAPH(c) (ISASCII(c) && isgraph(c))
 #define ISPRINT(c) (ISASCII(c) && isprint(c))
 #define ISDIGIT(c) (ISASCII(c) && isdigit(c))
 #define ISALNUM(c) (ISASCII(c) && isalnum(c))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] wildmatch: remove IS*() macros
  2023-02-10  7:59 [PATCH 0/5] Clean up wildmatch.c Masahiro Yamada
  2023-02-10  7:59 ` [PATCH 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
@ 2023-02-10  7:59 ` Masahiro Yamada
  2023-02-10  7:59 ` [PATCH 3/5] wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros Masahiro Yamada
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-10  7:59 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

This file was imported from rsync, which has some compatibility
layer because it relies on <ctypes.h> in C standard library.

In contrast, GIT has its own implementations in git-compat-util.h.

[1] isprint, isgraph

   They check the given char range in an obvious way

[2] isspace, isblank, isdigit, isalpha, isalnum, islower, isupper,
    iscntr, ispunct

   They look up sane_ctype[], which fills the range 0x80-0xff with 0.

[3] isxdigit

   It looks up hexval_table[], which fills the range 0x80-0xff with -1.

For all of these, ISACII() is a redundant check.

Remove IS*() macros, and directly use is*() in dowild().

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

 wildmatch.c | 55 ++++++++++++++++++-----------------------------------
 1 file changed, 18 insertions(+), 37 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index 85c4c7f8a7..a510b3fd23 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -22,25 +22,6 @@ typedef unsigned char uchar;
 				    && *(class) == *(litmatch) \
 				    && strncmp((char*)class, litmatch, len) == 0)
 
-#if defined STDC_HEADERS || !defined isascii
-# define ISASCII(c) 1
-#else
-# define ISASCII(c) isascii(c)
-#endif
-
-#define ISBLANK(c) (ISASCII(c) && isblank(c))
-#define ISGRAPH(c) (ISASCII(c) && isgraph(c))
-#define ISPRINT(c) (ISASCII(c) && isprint(c))
-#define ISDIGIT(c) (ISASCII(c) && isdigit(c))
-#define ISALNUM(c) (ISASCII(c) && isalnum(c))
-#define ISALPHA(c) (ISASCII(c) && isalpha(c))
-#define ISCNTRL(c) (ISASCII(c) && iscntrl(c))
-#define ISLOWER(c) (ISASCII(c) && islower(c))
-#define ISPUNCT(c) (ISASCII(c) && ispunct(c))
-#define ISSPACE(c) (ISASCII(c) && isspace(c))
-#define ISUPPER(c) (ISASCII(c) && isupper(c))
-#define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
-
 /* Match pattern "p" against "text" */
 static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 {
@@ -52,9 +33,9 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 		uchar t_ch, prev_ch;
 		if ((t_ch = *text) == '\0' && p_ch != '*')
 			return WM_ABORT_ALL;
-		if ((flags & WM_CASEFOLD) && ISUPPER(t_ch))
+		if ((flags & WM_CASEFOLD) && isupper(t_ch))
 			t_ch = tolower(t_ch);
-		if ((flags & WM_CASEFOLD) && ISUPPER(p_ch))
+		if ((flags & WM_CASEFOLD) && isupper(p_ch))
 			p_ch = tolower(p_ch);
 		switch (p_ch) {
 		case '\\':
@@ -133,11 +114,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 				 */
 				if (!is_glob_special(*p)) {
 					p_ch = *p;
-					if ((flags & WM_CASEFOLD) && ISUPPER(p_ch))
+					if ((flags & WM_CASEFOLD) && isupper(p_ch))
 						p_ch = tolower(p_ch);
 					while ((t_ch = *text) != '\0' &&
 					       (match_slash || t_ch != '/')) {
-						if ((flags & WM_CASEFOLD) && ISUPPER(t_ch))
+						if ((flags & WM_CASEFOLD) && isupper(t_ch))
 							t_ch = tolower(t_ch);
 						if (t_ch == p_ch)
 							break;
@@ -186,7 +167,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 					}
 					if (t_ch <= p_ch && t_ch >= prev_ch)
 						matched = 1;
-					else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) {
+					else if ((flags & WM_CASEFOLD) && islower(t_ch)) {
 						uchar t_ch_upper = toupper(t_ch);
 						if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
 							matched = 1;
@@ -208,42 +189,42 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 						continue;
 					}
 					if (CC_EQ(s,i, "alnum")) {
-						if (ISALNUM(t_ch))
+						if (isalnum(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "alpha")) {
-						if (ISALPHA(t_ch))
+						if (isalpha(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "blank")) {
-						if (ISBLANK(t_ch))
+						if (isblank(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "cntrl")) {
-						if (ISCNTRL(t_ch))
+						if (iscntrl(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "digit")) {
-						if (ISDIGIT(t_ch))
+						if (isdigit(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "graph")) {
-						if (ISGRAPH(t_ch))
+						if (isgraph(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "lower")) {
-						if (ISLOWER(t_ch))
+						if (islower(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "print")) {
-						if (ISPRINT(t_ch))
+						if (isprint(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "punct")) {
-						if (ISPUNCT(t_ch))
+						if (ispunct(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "space")) {
-						if (ISSPACE(t_ch))
+						if (isspace(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "upper")) {
-						if (ISUPPER(t_ch))
+						if (isupper(t_ch))
 							matched = 1;
-						else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch))
+						else if ((flags & WM_CASEFOLD) && islower(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "xdigit")) {
-						if (ISXDIGIT(t_ch))
+						if (isxdigit(t_ch))
 							matched = 1;
 					} else /* malformed [:class:] string */
 						return WM_ABORT_ALL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros
  2023-02-10  7:59 [PATCH 0/5] Clean up wildmatch.c Masahiro Yamada
  2023-02-10  7:59 ` [PATCH 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
  2023-02-10  7:59 ` [PATCH 2/5] wildmatch: remove IS*() macros Masahiro Yamada
@ 2023-02-10  7:59 ` Masahiro Yamada
  2023-02-10 13:11   ` Ævar Arnfjörð Bjarmason
  2023-02-10  7:59 ` [PATCH 4/5] wildmatch: use char instead of uchar Masahiro Yamada
  2023-02-10  7:59 ` [PATCH 5/5] wildmatch: more cleanups after killing uchar Masahiro Yamada
  4 siblings, 1 reply; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-10  7:59 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

The other glob patterns are hard-coded in dowild(). Do likewise.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

 wildmatch.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index a510b3fd23..93800b8eac 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -14,10 +14,6 @@
 
 typedef unsigned char uchar;
 
-/* What character marks an inverted character class? */
-#define NEGATE_CLASS	'!'
-#define NEGATE_CLASS2	'^'
-
 #define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
 				    && *(class) == *(litmatch) \
 				    && strncmp((char*)class, litmatch, len) == 0)
@@ -137,12 +133,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 			return WM_ABORT_ALL;
 		case '[':
 			p_ch = *++p;
-#ifdef NEGATE_CLASS2
-			if (p_ch == NEGATE_CLASS2)
-				p_ch = NEGATE_CLASS;
-#endif
 			/* Assign literal 1/0 because of "matched" comparison. */
-			negated = p_ch == NEGATE_CLASS ? 1 : 0;
+			negated = p_ch == '!' || p_ch == '^' ? 1 : 0;
 			if (negated) {
 				/* Inverted character class. */
 				p_ch = *++p;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] wildmatch: use char instead of uchar
  2023-02-10  7:59 [PATCH 0/5] Clean up wildmatch.c Masahiro Yamada
                   ` (2 preceding siblings ...)
  2023-02-10  7:59 ` [PATCH 3/5] wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros Masahiro Yamada
@ 2023-02-10  7:59 ` Masahiro Yamada
  2023-02-10 13:09   ` Ævar Arnfjörð Bjarmason
  2023-02-10  7:59 ` [PATCH 5/5] wildmatch: more cleanups after killing uchar Masahiro Yamada
  4 siblings, 1 reply; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-10  7:59 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

dowild() casts (char *) and (uchar *) back-and-forth, which is
ugly.

This file was imported from rsync, which started to use (unsigned char)
since the following commit:

 | commit e11c42511903adc6d27cf1671cc76fa711ea37e5
 | Author: Wayne Davison <wayned@samba.org>
 | Date:   Sun Jul 6 04:33:54 2003 +0000
 |
 |     - Added [:class:] handling to the character-class code.
 |     - Use explicit unsigned characters for proper set checks.
 |     - Made the character-class code honor backslash escapes.
 |     - Accept '^' as a class-negation character in addition to '!'.

Perhaps, it was needed because rsync relies on is*() from <ctypes.h>.

GIT has its own implementations, so the behavior is clear.

In fact, commit 4546738b58a0 ("Unlocalized isspace and friends")
says one of the motivations is "we want the right signed behaviour".

sane_istest() casts the given character to (unsigned char) anyway
before sane_ctype[] table lookup, so dowild() can use 'char'.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

 wildmatch.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index 93800b8eac..7dffd783cb 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -12,21 +12,19 @@
 #include "cache.h"
 #include "wildmatch.h"
 
-typedef unsigned char uchar;
-
 #define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
 				    && *(class) == *(litmatch) \
-				    && strncmp((char*)class, litmatch, len) == 0)
+				    && strncmp(class, litmatch, len) == 0)
 
 /* Match pattern "p" against "text" */
-static int dowild(const uchar *p, const uchar *text, unsigned int flags)
+static int dowild(const char *p, const char *text, unsigned int flags)
 {
-	uchar p_ch;
-	const uchar *pattern = p;
+	char p_ch;
+	const char *pattern = p;
 
 	for ( ; (p_ch = *p) != '\0'; text++, p++) {
 		int matched, match_slash, negated;
-		uchar t_ch, prev_ch;
+		char t_ch, prev_ch;
 		if ((t_ch = *text) == '\0' && p_ch != '*')
 			return WM_ABORT_ALL;
 		if ((flags & WM_CASEFOLD) && isupper(t_ch))
@@ -50,7 +48,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 			continue;
 		case '*':
 			if (*++p == '*') {
-				const uchar *prev_p = p - 2;
+				const char *prev_p = p - 2;
 				while (*++p == '*') {}
 				if (!(flags & WM_PATHNAME))
 					/* without WM_PATHNAME, '*' == '**' */
@@ -90,10 +88,10 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 				 * with WM_PATHNAME matches the next
 				 * directory
 				 */
-				const char *slash = strchr((char*)text, '/');
+				const char *slash = strchr(text, '/');
 				if (!slash)
 					return WM_NOMATCH;
-				text = (const uchar*)slash;
+				text = slash;
 				/* the slash is consumed by the top-level for loop */
 				break;
 			}
@@ -160,13 +158,13 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 					if (t_ch <= p_ch && t_ch >= prev_ch)
 						matched = 1;
 					else if ((flags & WM_CASEFOLD) && islower(t_ch)) {
-						uchar t_ch_upper = toupper(t_ch);
+						char t_ch_upper = toupper(t_ch);
 						if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
 							matched = 1;
 					}
 					p_ch = 0; /* This makes "prev_ch" get set to 0. */
 				} else if (p_ch == '[' && p[1] == ':') {
-					const uchar *s;
+					const char *s;
 					int i;
 					for (s = p += 2; (p_ch = *p) && p_ch != ']'; p++) {} /*SHARED ITERATOR*/
 					if (!p_ch)
@@ -237,5 +235,5 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 /* Match the "pattern" against the "text" string. */
 int wildmatch(const char *pattern, const char *text, unsigned int flags)
 {
-	return dowild((const uchar*)pattern, (const uchar*)text, flags);
+	return dowild(pattern, text, flags);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] wildmatch: more cleanups after killing uchar
  2023-02-10  7:59 [PATCH 0/5] Clean up wildmatch.c Masahiro Yamada
                   ` (3 preceding siblings ...)
  2023-02-10  7:59 ` [PATCH 4/5] wildmatch: use char instead of uchar Masahiro Yamada
@ 2023-02-10  7:59 ` Masahiro Yamada
  4 siblings, 0 replies; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-10  7:59 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

Remove the local function dowild(), which is now equivalent to
wildmatch().

Remove the local variable, slash.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

 wildmatch.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index 7dffd783cb..24577e9b8e 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -17,7 +17,7 @@
 				    && strncmp(class, litmatch, len) == 0)
 
 /* Match pattern "p" against "text" */
-static int dowild(const char *p, const char *text, unsigned int flags)
+int wildmatch(const char *p, const char *text, unsigned int flags)
 {
 	char p_ch;
 	const char *pattern = p;
@@ -66,7 +66,7 @@ static int dowild(const char *p, const char *text, unsigned int flags)
 					 * both foo/bar and foo/a/bar.
 					 */
 					if (p[0] == '/' &&
-					    dowild(p + 1, text, flags) == WM_MATCH)
+					    wildmatch(p + 1, text, flags) == WM_MATCH)
 						return WM_MATCH;
 					match_slash = 1;
 				} else /* WM_PATHNAME is set */
@@ -88,10 +88,9 @@ static int dowild(const char *p, const char *text, unsigned int flags)
 				 * with WM_PATHNAME matches the next
 				 * directory
 				 */
-				const char *slash = strchr(text, '/');
-				if (!slash)
+				text = strchr(text, '/');
+				if (!text)
 					return WM_NOMATCH;
-				text = slash;
 				/* the slash is consumed by the top-level for loop */
 				break;
 			}
@@ -121,7 +120,7 @@ static int dowild(const char *p, const char *text, unsigned int flags)
 					if (t_ch != p_ch)
 						return WM_NOMATCH;
 				}
-				if ((matched = dowild(p, text, flags)) != WM_NOMATCH) {
+				if ((matched = wildmatch(p, text, flags)) != WM_NOMATCH) {
 					if (!match_slash || matched != WM_ABORT_TO_STARSTAR)
 						return matched;
 				} else if (!match_slash && t_ch == '/')
@@ -231,9 +230,3 @@ static int dowild(const char *p, const char *text, unsigned int flags)
 
 	return *text ? WM_NOMATCH : WM_MATCH;
 }
-
-/* Match the "pattern" against the "text" string. */
-int wildmatch(const char *pattern, const char *text, unsigned int flags)
-{
-	return dowild(pattern, text, flags);
-}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] wildmatch: use char instead of uchar
  2023-02-10  7:59 ` [PATCH 4/5] wildmatch: use char instead of uchar Masahiro Yamada
@ 2023-02-10 13:09   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-02-10 13:09 UTC (permalink / raw)
  To: Masahiro Yamada; +Cc: git, Nguyễn Thái Ngọc Duy


On Fri, Feb 10 2023, Masahiro Yamada wrote:

> dowild() casts (char *) and (uchar *) back-and-forth, which is
> ugly.
>
> This file was imported from rsync, which started to use (unsigned char)
> since the following commit:
>
>  | commit e11c42511903adc6d27cf1671cc76fa711ea37e5
>  | Author: Wayne Davison <wayned@samba.org>
>  | Date:   Sun Jul 6 04:33:54 2003 +0000
>  |
>  |     - Added [:class:] handling to the character-class code.
>  |     - Use explicit unsigned characters for proper set checks.
>  |     - Made the character-class code honor backslash escapes.
>  |     - Accept '^' as a class-negation character in addition to '!'.
>
> Perhaps, it was needed because rsync relies on is*() from <ctypes.h>.
>
> GIT has its own implementations, so the behavior is clear.
>
> In fact, commit 4546738b58a0 ("Unlocalized isspace and friends")
> says one of the motivations is "we want the right signed behaviour".
>
> sane_istest() casts the given character to (unsigned char) anyway
> before sane_ctype[] table lookup, so dowild() can use 'char'.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
>  wildmatch.c | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/wildmatch.c b/wildmatch.c
> index 93800b8eac..7dffd783cb 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -12,21 +12,19 @@
>  #include "cache.h"
>  #include "wildmatch.h"
>  
> -typedef unsigned char uchar;
> -
>  #define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
>  				    && *(class) == *(litmatch) \
> -				    && strncmp((char*)class, litmatch, len) == 0)
> +				    && strncmp(class, litmatch, len) == 0)
>  
>  /* Match pattern "p" against "text" */
> -static int dowild(const uchar *p, const uchar *text, unsigned int flags)
> +static int dowild(const char *p, const char *text, unsigned int flags)
>  {
> -	uchar p_ch;
> -	const uchar *pattern = p;
> +	char p_ch;
> +	const char *pattern = p;
>  
>  	for ( ; (p_ch = *p) != '\0'; text++, p++) {
>  		int matched, match_slash, negated;
> -		uchar t_ch, prev_ch;
> +		char t_ch, prev_ch;
>  		if ((t_ch = *text) == '\0' && p_ch != '*')
>  			return WM_ABORT_ALL;
>  		if ((flags & WM_CASEFOLD) && isupper(t_ch))
> @@ -50,7 +48,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  			continue;
>  		case '*':
>  			if (*++p == '*') {
> -				const uchar *prev_p = p - 2;
> +				const char *prev_p = p - 2;
>  				while (*++p == '*') {}
>  				if (!(flags & WM_PATHNAME))
>  					/* without WM_PATHNAME, '*' == '**' */
> @@ -90,10 +88,10 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  				 * with WM_PATHNAME matches the next
>  				 * directory
>  				 */
> -				const char *slash = strchr((char*)text, '/');
> +				const char *slash = strchr(text, '/');
>  				if (!slash)
>  					return WM_NOMATCH;
> -				text = (const uchar*)slash;
> +				text = slash;
>  				/* the slash is consumed by the top-level for loop */
>  				break;
>  			}
> @@ -160,13 +158,13 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  					if (t_ch <= p_ch && t_ch >= prev_ch)
>  						matched = 1;
>  					else if ((flags & WM_CASEFOLD) && islower(t_ch)) {
> -						uchar t_ch_upper = toupper(t_ch);
> +						char t_ch_upper = toupper(t_ch);
>  						if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
>  							matched = 1;
>  					}
>  					p_ch = 0; /* This makes "prev_ch" get set to 0. */
>  				} else if (p_ch == '[' && p[1] == ':') {
> -					const uchar *s;
> +					const char *s;
>  					int i;
>  					for (s = p += 2; (p_ch = *p) && p_ch != ']'; p++) {} /*SHARED ITERATOR*/
>  					if (!p_ch)
> @@ -237,5 +235,5 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  /* Match the "pattern" against the "text" string. */
>  int wildmatch(const char *pattern, const char *text, unsigned int flags)
>  {
> -	return dowild((const uchar*)pattern, (const uchar*)text, flags);
> +	return dowild(pattern, text, flags);
>  }

This looks good to me. I independently wrote much the same a while ago
for another reason, in: https://github.com/avar/git/commit/079f555375a

I.e. this happens to be the only bit in-tree that's stopping us from
running the xlc compiler in the c99 mode.

My solution was different, but I like yours better. I had not done your
analysis to discover that we didn't need this to be unsigned in the
first place, I merly converted the "uchar" to an "unsigned char".


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros
  2023-02-10  7:59 ` [PATCH 3/5] wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros Masahiro Yamada
@ 2023-02-10 13:11   ` Ævar Arnfjörð Bjarmason
  2023-02-10 17:03     ` Masahiro Yamada
  0 siblings, 1 reply; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-02-10 13:11 UTC (permalink / raw)
  To: Masahiro Yamada; +Cc: git, Nguyễn Thái Ngọc Duy


On Fri, Feb 10 2023, Masahiro Yamada wrote:

> The other glob patterns are hard-coded in dowild(). Do likewise.
> [...]
> diff --git a/wildmatch.c b/wildmatch.c
> index a510b3fd23..93800b8eac 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -14,10 +14,6 @@
>  
>  typedef unsigned char uchar;
>  
> -/* What character marks an inverted character class? */
> -#define NEGATE_CLASS	'!'
> -#define NEGATE_CLASS2	'^'

Thanks, maybe these made sense in rsync's codebase, but...

>  #define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
>  				    && *(class) == *(litmatch) \
>  				    && strncmp((char*)class, litmatch, len) == 0)
> @@ -137,12 +133,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  			return WM_ABORT_ALL;
>  		case '[':
>  			p_ch = *++p;

...as the context shows we hardcode most of these tokens.

> -#ifdef NEGATE_CLASS2
> -			if (p_ch == NEGATE_CLASS2)
> -				p_ch = NEGATE_CLASS;
> -#endif

Hrm, but isn't this a logic error? No it's not, because...

>  			/* Assign literal 1/0 because of "matched" comparison. */
> -			negated = p_ch == NEGATE_CLASS ? 1 : 0;
> +			negated = p_ch == '!' || p_ch == '^' ? 1 : 0;

...you're refcatoring thise while at it.

Personally I'd prefer to just see this change without this, and it came
as a surprise given the commit message.

I find the pre-image easier to read (sans the macro check, which you
should be removing).


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-10  7:59 ` [PATCH 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
@ 2023-02-10 13:16   ` Ævar Arnfjörð Bjarmason
  2023-02-10 16:56     ` Masahiro Yamada
  2023-02-10 19:10   ` Junio C Hamano
  2023-02-10 22:03   ` René Scharfe
  2 siblings, 1 reply; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-02-10 13:16 UTC (permalink / raw)
  To: Masahiro Yamada; +Cc: git, Nguyễn Thái Ngọc Duy


On Fri, Feb 10 2023, Masahiro Yamada wrote:

> git-compat-util.h implements most of is*() macros.
>
> Add isblank() and isgraph(), which are useful to clean up wildmatch.c
> in a consistent way (in this and later commits).

You are on a journey to fix wildmatch.c, so...

> The same issue already exists for isspace().
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
> [...]
> -#ifdef isblank
> -# define ISBLANK(c) (ISASCII(c) && isblank(c))
> -#else
> -# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
> -#endif
> -
> -#ifdef isgraph
> -# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
> -#else
> -# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
> -#endif
> -
> +#define ISBLANK(c) (ISASCII(c) && isblank(c))
> +#define ISGRAPH(c) (ISASCII(c) && isgraph(c))
>  #define ISPRINT(c) (ISASCII(c) && isprint(c))
>  #define ISDIGIT(c) (ISASCII(c) && isdigit(c))
>  #define ISALNUM(c) (ISASCII(c) && isalnum(c))

You make this change, but not others in tree.

Personally I wouldn't mind seeing this expanded to fix the various other
trivially convertable cases in-tree, e.g.:
	
	$ git -P grep -n "(' '|'\\\t').*\|\|.*(' '|'\\\t')"
	builtin/am.c:602:               if (*sb.buf == '\t' || *sb.buf == ' ')
	compat/regex/regex_internal.h:60:# define isblank(ch) ((ch) == ' ' || (ch) == '\t')
	compat/regex/regex_internal.h:73:   return (c == ' ' || c == '\t');
	config.c:893:   while (c == ' ' || c == '\t')
	fsck.c:837:     while (*p == ' ' || *p == '\t')
	mailinfo.c:749:     (line->buf[0] == ' ' || line->buf[0] == '\t')) {
	sequencer.c:2476:                (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
	sequencer.c:2536:                  (*bol == ' ' || *bol == '\t')) {
	sequencer.c:2540:                                 (*bol == ' ' || *bol == '\t')) {
	sequencer.c:2594:       if (is_command(TODO_PICK, &bol) && (*bol == ' ' || *bol == '\t'))
	sequencer.c:2597:                (*bol == ' ' || *bol == '\t'))
	t/helper/test-json-writer.c:443:                if (c == '\n' || c == '\r' || c == ' ' || c == '\t')
	t/helper/test-json-writer.c:449:        while (*buf == ' ' || *buf == '\t')
	t/t4256/1/mailinfo.c:737:           (line->buf[0] == ' ' || line->buf[0] == '\t')) {
	t/t4256/1/mailinfo.c.orig:726:      (line->buf[0] == ' ' || line->buf[0] == '\t')) {
	trailer.c:630:          if (c != line && (*c == ' ' || *c == '\t')) {
	wildmatch.c:34:# define ISBLANK(c) ((c) == ' ' || (c) == '\t')

Some of those are false positves, but most are not.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-10 13:16   ` Ævar Arnfjörð Bjarmason
@ 2023-02-10 16:56     ` Masahiro Yamada
  0 siblings, 0 replies; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-10 16:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Nguyễn Thái Ngọc Duy

On Fri, Feb 10, 2023 at 10:20 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Fri, Feb 10 2023, Masahiro Yamada wrote:
>
> > git-compat-util.h implements most of is*() macros.
> >
> > Add isblank() and isgraph(), which are useful to clean up wildmatch.c
> > in a consistent way (in this and later commits).
>
> You are on a journey to fix wildmatch.c, so...
>
> > The same issue already exists for isspace().
> >
> > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> > ---
> > [...]
> > -#ifdef isblank
> > -# define ISBLANK(c) (ISASCII(c) && isblank(c))
> > -#else
> > -# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
> > -#endif
> > -
> > -#ifdef isgraph
> > -# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
> > -#else
> > -# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
> > -#endif
> > -
> > +#define ISBLANK(c) (ISASCII(c) && isblank(c))
> > +#define ISGRAPH(c) (ISASCII(c) && isgraph(c))
> >  #define ISPRINT(c) (ISASCII(c) && isprint(c))
> >  #define ISDIGIT(c) (ISASCII(c) && isdigit(c))
> >  #define ISALNUM(c) (ISASCII(c) && isalnum(c))
>
> You make this change, but not others in tree.
>
> Personally I wouldn't mind seeing this expanded to fix the various other


I wouldn't mind seeing somebody use this for tree-wide cleanups.
I do not see any reason to do that in this patch set, either.





> trivially convertable cases in-tree, e.g.:
>
>         $ git -P grep -n "(' '|'\\\t').*\|\|.*(' '|'\\\t')"
>         builtin/am.c:602:               if (*sb.buf == '\t' || *sb.buf == ' ')
>         compat/regex/regex_internal.h:60:# define isblank(ch) ((ch) == ' ' || (ch) == '\t')
>         compat/regex/regex_internal.h:73:   return (c == ' ' || c == '\t');
>         config.c:893:   while (c == ' ' || c == '\t')
>         fsck.c:837:     while (*p == ' ' || *p == '\t')
>         mailinfo.c:749:     (line->buf[0] == ' ' || line->buf[0] == '\t')) {
>         sequencer.c:2476:                (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
>         sequencer.c:2536:                  (*bol == ' ' || *bol == '\t')) {
>         sequencer.c:2540:                                 (*bol == ' ' || *bol == '\t')) {
>         sequencer.c:2594:       if (is_command(TODO_PICK, &bol) && (*bol == ' ' || *bol == '\t'))
>         sequencer.c:2597:                (*bol == ' ' || *bol == '\t'))
>         t/helper/test-json-writer.c:443:                if (c == '\n' || c == '\r' || c == ' ' || c == '\t')
>         t/helper/test-json-writer.c:449:        while (*buf == ' ' || *buf == '\t')
>         t/t4256/1/mailinfo.c:737:           (line->buf[0] == ' ' || line->buf[0] == '\t')) {
>         t/t4256/1/mailinfo.c.orig:726:      (line->buf[0] == ' ' || line->buf[0] == '\t')) {
>         trailer.c:630:          if (c != line && (*c == ' ' || *c == '\t')) {
>         wildmatch.c:34:# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
>
> Some of those are false positves, but most are not.



--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros
  2023-02-10 13:11   ` Ævar Arnfjörð Bjarmason
@ 2023-02-10 17:03     ` Masahiro Yamada
  0 siblings, 0 replies; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-10 17:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Nguyễn Thái Ngọc Duy

On Fri, Feb 10, 2023 at 10:14 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Fri, Feb 10 2023, Masahiro Yamada wrote:
>
> > The other glob patterns are hard-coded in dowild(). Do likewise.
> > [...]
> > diff --git a/wildmatch.c b/wildmatch.c
> > index a510b3fd23..93800b8eac 100644
> > --- a/wildmatch.c
> > +++ b/wildmatch.c
> > @@ -14,10 +14,6 @@
> >
> >  typedef unsigned char uchar;
> >
> > -/* What character marks an inverted character class? */
> > -#define NEGATE_CLASS '!'
> > -#define NEGATE_CLASS2        '^'
>
> Thanks, maybe these made sense in rsync's codebase, but...
>
> >  #define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
> >                                   && *(class) == *(litmatch) \
> >                                   && strncmp((char*)class, litmatch, len) == 0)
> > @@ -137,12 +133,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
> >                       return WM_ABORT_ALL;
> >               case '[':
> >                       p_ch = *++p;
>
> ...as the context shows we hardcode most of these tokens.
>
> > -#ifdef NEGATE_CLASS2
> > -                     if (p_ch == NEGATE_CLASS2)
> > -                             p_ch = NEGATE_CLASS;
> > -#endif
>
> Hrm, but isn't this a logic error? No it's not, because...
>
> >                       /* Assign literal 1/0 because of "matched" comparison. */
> > -                     negated = p_ch == NEGATE_CLASS ? 1 : 0;
> > +                     negated = p_ch == '!' || p_ch == '^' ? 1 : 0;
>
> ...you're refcatoring thise while at it.


Yes, this is a trivial refactoring.


Or, do you suggest splitting this into two patches,
verbatim replace + a small clean-up?

>
> Personally I'd prefer to just see this change without this, and it came
> as a surprise given the commit message.
>
> I find the pre-image easier to read (sans the macro check, which you
> should be removing).
>


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-10  7:59 ` [PATCH 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
  2023-02-10 13:16   ` Ævar Arnfjörð Bjarmason
@ 2023-02-10 19:10   ` Junio C Hamano
  2023-02-10 19:25     ` Masahiro Yamada
  2023-02-10 22:03   ` René Scharfe
  2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2023-02-10 19:10 UTC (permalink / raw)
  To: Masahiro Yamada; +Cc: git, Nguyễn Thái Ngọc Duy

Masahiro Yamada <masahiroy@kernel.org> writes:

> Use them with care because they are not robust against the pointer
> increment, like isblank(*s++).
>
> The same issue already exists for isspace().

Does it?

>  #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
>  #define isascii(x) (((x) & ~0x7f) == 0)
>  #define isspace(x) sane_istest(x,GIT_SPACE)

sane_istest() evaluates x and mask only once, and isspace evaluates
x only once, no?

	isspace(*x++)
	->
	sane_istest(*x++,GIT_SPACE)
	->
	((sane_ctype[(unsigned char)(*x++)] & GIT_SPACE) != 0)

> +#define isblank(x) (isspace(x) || (x) == '\t')
> +#define isgraph(x) (isprint(x) && !isspace(x))

Given that all the other tests are implemented with just picking an
integer from the table and checking bit(s), the above two look
somewhat out of place.  The fact, as you noted, that they cannot be
used safely does not help, either.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-10 19:10   ` Junio C Hamano
@ 2023-02-10 19:25     ` Masahiro Yamada
  0 siblings, 0 replies; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-10 19:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyễn Thái Ngọc Duy

On Sat, Feb 11, 2023 at 4:11 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Masahiro Yamada <masahiroy@kernel.org> writes:
>
> > Use them with care because they are not robust against the pointer
> > increment, like isblank(*s++).
> >
> > The same issue already exists for isspace().
>
> Does it?


Sorry, I meant:

 "The same issue already exists for isprint()"




   #define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)


is not robust against pointer increment.




>
> >  #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
> >  #define isascii(x) (((x) & ~0x7f) == 0)
> >  #define isspace(x) sane_istest(x,GIT_SPACE)
>
> sane_istest() evaluates x and mask only once, and isspace evaluates
> x only once, no?
>
>         isspace(*x++)
>         ->
>         sane_istest(*x++,GIT_SPACE)
>         ->
>         ((sane_ctype[(unsigned char)(*x++)] & GIT_SPACE) != 0)
>
> > +#define isblank(x) (isspace(x) || (x) == '\t')
> > +#define isgraph(x) (isprint(x) && !isspace(x))
>
> Given that all the other tests are implemented with just picking an
> integer from the table and checking bit(s), the above two look
> somewhat out of place.  The fact, as you noted, that they cannot be
> used safely does not help, either.
>
>


At first, I hesitated to move these to git-compat-util.h
for this reason, but I noticed isprint() was in the same situation.

So, I decided to go.



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-10  7:59 ` [PATCH 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
  2023-02-10 13:16   ` Ævar Arnfjörð Bjarmason
  2023-02-10 19:10   ` Junio C Hamano
@ 2023-02-10 22:03   ` René Scharfe
  2023-02-11  7:01     ` Masahiro Yamada
  2 siblings, 1 reply; 17+ messages in thread
From: René Scharfe @ 2023-02-10 22:03 UTC (permalink / raw)
  To: Masahiro Yamada, git; +Cc: Nguyễn Thái Ngọc Duy

Am 10.02.23 um 08:59 schrieb Masahiro Yamada:
> git-compat-util.h implements most of is*() macros.
>
> Add isblank() and isgraph(), which are useful to clean up wildmatch.c
> in a consistent way (in this and later commits).
>
> Use them with care because they are not robust against the pointer
> increment, like isblank(*s++).
>
> The same issue already exists for isspace().
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
>  git-compat-util.h |  4 ++++
>  wildmatch.c       | 14 ++------------
>  2 files changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 4f0028ce60..90b43b2bc9 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1212,10 +1212,12 @@ extern const unsigned char tolower_trans_tbl[256];
>  /* Sane ctype - no locale, and works with signed chars */
>  #undef isascii
>  #undef isspace
> +#undef isblank
>  #undef isdigit
>  #undef isalpha
>  #undef isalnum
>  #undef isprint
> +#undef isgraph
>  #undef islower
>  #undef isupper
>  #undef tolower
> @@ -1236,10 +1238,12 @@ extern const unsigned char sane_ctype[256];
>  #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
>  #define isascii(x) (((x) & ~0x7f) == 0)
>  #define isspace(x) sane_istest(x,GIT_SPACE)
> +#define isblank(x) (isspace(x) || (x) == '\t')

Our isspace() matches \t, \n, \r and space.  A standard implementation
would also match \v (vertical tab) and \f (form feed).  Anyway, your
isblank() here is the same as isspace() because the check for \t is
redundant...

>  #define isdigit(x) sane_istest(x,GIT_DIGIT)
>  #define isalpha(x) sane_istest(x,GIT_ALPHA)
>  #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
>  #define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)
> +#define isgraph(x) (isprint(x) && !isspace(x))
>  #define islower(x) sane_iscase(x, 1)
>  #define isupper(x) sane_iscase(x, 0)
>  #define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL)
> diff --git a/wildmatch.c b/wildmatch.c
> index 7e5a7ea1ea..85c4c7f8a7 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -28,18 +28,8 @@ typedef unsigned char uchar;
>  # define ISASCII(c) isascii(c)
>  #endif
>
> -#ifdef isblank
> -# define ISBLANK(c) (ISASCII(c) && isblank(c))
> -#else
> -# define ISBLANK(c) ((c) == ' ' || (c) == '\t')

... and that's not the case for the original code, which only
matches \t and space.

We already use eight bits for the lookup table values in sane_ctype.
Perhaps it's time to move GIT_GLOB_SPECIAL, GIT_REGEX_SPECIAL,
GIT_PATHSPEC_MAGIC and GIT_PUNCT to their own table to make room for
flags for isprint(), isgraph() and isblank().

> -#endif
> -
> -#ifdef isgraph
> -# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
> -#else
> -# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
> -#endif
> -
> +#define ISBLANK(c) (ISASCII(c) && isblank(c))
> +#define ISGRAPH(c) (ISASCII(c) && isgraph(c))
>  #define ISPRINT(c) (ISASCII(c) && isprint(c))
>  #define ISDIGIT(c) (ISASCII(c) && isdigit(c))
>  #define ISALNUM(c) (ISASCII(c) && isalnum(c))


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-10 22:03   ` René Scharfe
@ 2023-02-11  7:01     ` Masahiro Yamada
  2023-02-11 13:48       ` René Scharfe
  0 siblings, 1 reply; 17+ messages in thread
From: Masahiro Yamada @ 2023-02-11  7:01 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Nguyễn Thái Ngọc Duy

On Sat, Feb 11, 2023 at 7:03 AM René Scharfe <l.s.r@web.de> wrote:
>
> Am 10.02.23 um 08:59 schrieb Masahiro Yamada:
> > git-compat-util.h implements most of is*() macros.
> >
> > Add isblank() and isgraph(), which are useful to clean up wildmatch.c
> > in a consistent way (in this and later commits).
> >
> > Use them with care because they are not robust against the pointer
> > increment, like isblank(*s++).
> >
> > The same issue already exists for isspace().
> >
> > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> > ---
> >
> >  git-compat-util.h |  4 ++++
> >  wildmatch.c       | 14 ++------------
> >  2 files changed, 6 insertions(+), 12 deletions(-)
> >
> > diff --git a/git-compat-util.h b/git-compat-util.h
> > index 4f0028ce60..90b43b2bc9 100644
> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -1212,10 +1212,12 @@ extern const unsigned char tolower_trans_tbl[256];
> >  /* Sane ctype - no locale, and works with signed chars */
> >  #undef isascii
> >  #undef isspace
> > +#undef isblank
> >  #undef isdigit
> >  #undef isalpha
> >  #undef isalnum
> >  #undef isprint
> > +#undef isgraph
> >  #undef islower
> >  #undef isupper
> >  #undef tolower
> > @@ -1236,10 +1238,12 @@ extern const unsigned char sane_ctype[256];
> >  #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
> >  #define isascii(x) (((x) & ~0x7f) == 0)
> >  #define isspace(x) sane_istest(x,GIT_SPACE)
> > +#define isblank(x) (isspace(x) || (x) == '\t')
>
> Our isspace() matches \t, \n, \r and space.  A standard implementation
> would also match \v (vertical tab) and \f (form feed).  Anyway, your
> isblank() here is the same as isspace() because the check for \t is
> redundant...


My bad - I missed 'Z' in cane_ctype[].


>
> >  #define isdigit(x) sane_istest(x,GIT_DIGIT)
> >  #define isalpha(x) sane_istest(x,GIT_ALPHA)
> >  #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
> >  #define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)
> > +#define isgraph(x) (isprint(x) && !isspace(x))
> >  #define islower(x) sane_iscase(x, 1)
> >  #define isupper(x) sane_iscase(x, 0)
> >  #define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL)
> > diff --git a/wildmatch.c b/wildmatch.c
> > index 7e5a7ea1ea..85c4c7f8a7 100644
> > --- a/wildmatch.c
> > +++ b/wildmatch.c
> > @@ -28,18 +28,8 @@ typedef unsigned char uchar;
> >  # define ISASCII(c) isascii(c)
> >  #endif
> >
> > -#ifdef isblank
> > -# define ISBLANK(c) (ISASCII(c) && isblank(c))
> > -#else
> > -# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
>
> ... and that's not the case for the original code, which only
> matches \t and space.
>
> We already use eight bits for the lookup table values in sane_ctype.
> Perhaps it's time to move GIT_GLOB_SPECIAL, GIT_REGEX_SPECIAL,
> GIT_PATHSPEC_MAGIC and GIT_PUNCT to their own table to make room for
> flags for isprint(), isgraph() and isblank().



I think that is a good idea.

If we can have more room for isupper() and islower(),
we will be able to delete sane_iscase().


(I was thinking 'unsigned short', but having two tables is OK.)



So, how can I proceed with this patchset?



[A] After somebody refactors ctype.c table,
    I will rebase this series on top of that.

[B] keep isblank() and isgraph() local to wildmatch.c
    until that happens, so I can proceed without
    depending on the ctype.c refactoring.

    Apparently, wildmatch.c is not using a pointer
    increment with isblank() or isgraph().

[C] If 'somebody' in [A] is supposed to me,
    my v2 will include ctype refactoring.



Thought?





> > -#endif
> > -
> > -#ifdef isgraph
> > -# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
> > -#else
> > -# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
> > -#endif
> > -
> > +#define ISBLANK(c) (ISASCII(c) && isblank(c))
> > +#define ISGRAPH(c) (ISASCII(c) && isgraph(c))
> >  #define ISPRINT(c) (ISASCII(c) && isprint(c))
> >  #define ISDIGIT(c) (ISASCII(c) && isdigit(c))
> >  #define ISALNUM(c) (ISASCII(c) && isalnum(c))
>
--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-11  7:01     ` Masahiro Yamada
@ 2023-02-11 13:48       ` René Scharfe
  2023-02-11 14:11         ` René Scharfe
  0 siblings, 1 reply; 17+ messages in thread
From: René Scharfe @ 2023-02-11 13:48 UTC (permalink / raw)
  To: Masahiro Yamada; +Cc: git, Nguyễn Thái Ngọc Duy

Am 11.02.23 um 08:01 schrieb Masahiro Yamada:
> On Sat, Feb 11, 2023 at 7:03 AM René Scharfe <l.s.r@web.de> wrote:
>>
>> We already use eight bits for the lookup table values in sane_ctype.
>> Perhaps it's time to move GIT_GLOB_SPECIAL, GIT_REGEX_SPECIAL,
>> GIT_PATHSPEC_MAGIC and GIT_PUNCT to their own table to make room for
>> flags for isprint(), isgraph() and isblank().
>
>
>
> I think that is a good idea.
>
> If we can have more room for isupper() and islower(),
> we will be able to delete sane_iscase().
>
>
> (I was thinking 'unsigned short', but having two tables is OK.)
>
>
>
> So, how can I proceed with this patchset?
>
>
>
> [A] After somebody refactors ctype.c table,
>     I will rebase this series on top of that.
>
> [B] keep isblank() and isgraph() local to wildmatch.c
>     until that happens, so I can proceed without
>     depending on the ctype.c refactoring.
>
>     Apparently, wildmatch.c is not using a pointer
>     increment with isblank() or isgraph().
>
> [C] If 'somebody' in [A] is supposed to me,
>     my v2 will include ctype refactoring.

[D] We need more tests first. :)  I sent patches to test the current
classifiers separately.  A similar test for isblank would have helped
you detect the mismatch quickly.  Full test coverage also gives
confidence when tinkering with the existing classifiers.

1c149ab2dd (ctype: support iscntrl, ispunct, isxdigit and isprint,
2012-10-15) added an implementation of isprint that evaluated its
argument only once, by the way, but the one from 0fcec2ce54
(format-patch: make rfc2047 encoding more strict, 2012-10-18)
replaced it.

Widening the element type of the lookup table would work, but might
impact performance.  I guess it would be slower, but perhaps we'd
have to measure it to be sure.  Splitting the table into unrelated
subsets would avoid that.

Deciding which flags to move requires knowing the full target set,
I think.  The punctuation-related flags looked to me like good
candidates until I saw the original isprint implementation which
uses them and several of the others.  So I'm not so sure anymore,
but here's a patch that moves them out:

---
 ctype.c           | 51 +++++++++++++++++++++++++++++++++++++----------
 git-compat-util.h | 21 ++++++++++---------
 2 files changed, 53 insertions(+), 19 deletions(-)

diff --git a/ctype.c b/ctype.c
index fc0225cebd..c7a0dcba13 100644
--- a/ctype.c
+++ b/ctype.c
@@ -9,26 +9,57 @@ enum {
 	S = GIT_SPACE,
 	A = GIT_ALPHA,
 	D = GIT_DIGIT,
-	G = GIT_GLOB_SPECIAL,	/* *, ?, [, \\ */
-	R = GIT_REGEX_SPECIAL,	/* $, (, ), +, ., ^, {, | */
-	P = GIT_PATHSPEC_MAGIC, /* other non-alnum, except for ] and } */
 	X = GIT_CNTRL,
-	U = GIT_PUNCT,
 	Z = GIT_CNTRL | GIT_SPACE
 };

 const unsigned char sane_ctype[256] = {
 	X, X, X, X, X, X, X, X, X, Z, Z, X, X, Z, X, X,		/*   0.. 15 */
 	X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X,		/*  16.. 31 */
-	S, P, P, P, R, P, P, P, R, R, G, R, P, P, R, P,		/*  32.. 47 */
-	D, D, D, D, D, D, D, D, D, D, P, P, P, P, P, G,		/*  48.. 63 */
-	P, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,		/*  64.. 79 */
-	A, A, A, A, A, A, A, A, A, A, A, G, G, U, R, P,		/*  80.. 95 */
-	P, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,		/*  96..111 */
-	A, A, A, A, A, A, A, A, A, A, A, R, R, U, P, X,		/* 112..127 */
+	S, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,		/*  32.. 47 */
+	D, D, D, D, D, D, D, D, D, D, 0, 0, 0, 0, 0, 0,		/*  48.. 63 */
+	0, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,		/*  64.. 79 */
+	A, A, A, A, A, A, A, A, A, A, A, 0, 0, 0, 0, 0,		/*  80.. 95 */
+	0, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,		/*  96..111 */
+	A, A, A, A, A, A, A, A, A, A, A, 0, 0, 0, 0, X,		/* 112..127 */
 	/* Nothing in the 128.. range */
 };

+const unsigned char sane_ctype2[256] = {
+	['*'] = GIT_GLOB_SPECIAL,
+	['?'] = GIT_GLOB_SPECIAL,
+	['['] = GIT_GLOB_SPECIAL,
+	['\\'] = GIT_GLOB_SPECIAL,
+	['$'] = GIT_REGEX_SPECIAL,
+	['('] = GIT_REGEX_SPECIAL,
+	[')'] = GIT_REGEX_SPECIAL,
+	['+'] = GIT_REGEX_SPECIAL,
+	['.'] = GIT_REGEX_SPECIAL,
+	['^'] = GIT_REGEX_SPECIAL,
+	['{'] = GIT_REGEX_SPECIAL,
+	['|'] = GIT_REGEX_SPECIAL,
+	['!'] = GIT_PATHSPEC_MAGIC,
+	['"'] = GIT_PATHSPEC_MAGIC,
+	['#'] = GIT_PATHSPEC_MAGIC,
+	['%'] = GIT_PATHSPEC_MAGIC,
+	['&'] = GIT_PATHSPEC_MAGIC,
+	['\''] = GIT_PATHSPEC_MAGIC,
+	[','] = GIT_PATHSPEC_MAGIC,
+	['-'] = GIT_PATHSPEC_MAGIC,
+	['/'] = GIT_PATHSPEC_MAGIC,
+	[':'] = GIT_PATHSPEC_MAGIC,
+	[';'] = GIT_PATHSPEC_MAGIC,
+	['<'] = GIT_PATHSPEC_MAGIC,
+	['='] = GIT_PATHSPEC_MAGIC,
+	['>'] = GIT_PATHSPEC_MAGIC,
+	['@'] = GIT_PATHSPEC_MAGIC,
+	['_'] = GIT_PATHSPEC_MAGIC,
+	['`'] = GIT_PATHSPEC_MAGIC,
+	['~'] = GIT_PATHSPEC_MAGIC,
+	[']'] = GIT_PUNCT,
+	['}'] = GIT_PUNCT,
+};
+
 /* For case-insensitive kwset */
 const unsigned char tolower_trans_tbl[256] = {
 	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
diff --git a/git-compat-util.h b/git-compat-util.h
index 4f0028ce60..61e71fe702 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1228,11 +1228,7 @@ extern const unsigned char sane_ctype[256];
 #define GIT_SPACE 0x01
 #define GIT_DIGIT 0x02
 #define GIT_ALPHA 0x04
-#define GIT_GLOB_SPECIAL 0x08
-#define GIT_REGEX_SPECIAL 0x10
-#define GIT_PATHSPEC_MAGIC 0x20
 #define GIT_CNTRL 0x40
-#define GIT_PUNCT 0x80
 #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
 #define isascii(x) (((x) & ~0x7f) == 0)
 #define isspace(x) sane_istest(x,GIT_SPACE)
@@ -1242,15 +1238,22 @@ extern const unsigned char sane_ctype[256];
 #define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)
 #define islower(x) sane_iscase(x, 1)
 #define isupper(x) sane_iscase(x, 0)
-#define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL)
-#define is_regex_special(x) sane_istest(x,GIT_GLOB_SPECIAL | GIT_REGEX_SPECIAL)
 #define iscntrl(x) (sane_istest(x,GIT_CNTRL))
-#define ispunct(x) sane_istest(x, GIT_PUNCT | GIT_REGEX_SPECIAL | \
-		GIT_GLOB_SPECIAL | GIT_PATHSPEC_MAGIC)
 #define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1)
 #define tolower(x) sane_case((unsigned char)(x), 0x20)
 #define toupper(x) sane_case((unsigned char)(x), 0)
-#define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC)
+
+extern const unsigned char sane_ctype2[256];
+#define GIT_GLOB_SPECIAL 0x01
+#define GIT_REGEX_SPECIAL 0x02
+#define GIT_PATHSPEC_MAGIC 0x04
+#define GIT_PUNCT 0x08
+#define sane_istest2(x,mask) ((sane_ctype2[(unsigned char)(x)] & (mask)) != 0)
+#define is_glob_special(x) sane_istest2(x,GIT_GLOB_SPECIAL)
+#define is_regex_special(x) sane_istest2(x,GIT_GLOB_SPECIAL | GIT_REGEX_SPECIAL)
+#define ispunct(x) sane_istest2(x, GIT_PUNCT | GIT_REGEX_SPECIAL | \
+		GIT_GLOB_SPECIAL | GIT_PATHSPEC_MAGIC)
+#define is_pathspec_magic(x) sane_istest2(x,GIT_PATHSPEC_MAGIC)

 static inline int sane_case(int x, int high)
 {
--
2.39.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-11 13:48       ` René Scharfe
@ 2023-02-11 14:11         ` René Scharfe
  0 siblings, 0 replies; 17+ messages in thread
From: René Scharfe @ 2023-02-11 14:11 UTC (permalink / raw)
  To: Masahiro Yamada; +Cc: git, Nguyễn Thái Ngọc Duy

Am 11.02.23 um 14:48 schrieb René Scharfe:
> Am 11.02.23 um 08:01 schrieb Masahiro Yamada:
>> On Sat, Feb 11, 2023 at 7:03 AM René Scharfe <l.s.r@web.de> wrote:
>>>
>>> We already use eight bits for the lookup table values in sane_ctype.
>>> Perhaps it's time to move GIT_GLOB_SPECIAL, GIT_REGEX_SPECIAL,
>>> GIT_PATHSPEC_MAGIC and GIT_PUNCT to their own table to make room for
>>> flags for isprint(), isgraph() and isblank().
>>
>>
>>
>> I think that is a good idea.
>>
>> If we can have more room for isupper() and islower(),
>> we will be able to delete sane_iscase().
>>
>>
>> (I was thinking 'unsigned short', but having two tables is OK.)
>>
>>
>>
>> So, how can I proceed with this patchset?
>>
>>
>>
>> [A] After somebody refactors ctype.c table,
>>     I will rebase this series on top of that.
>>
>> [B] keep isblank() and isgraph() local to wildmatch.c
>>     until that happens, so I can proceed without
>>     depending on the ctype.c refactoring.
>>
>>     Apparently, wildmatch.c is not using a pointer
>>     increment with isblank() or isgraph().
>>
>> [C] If 'somebody' in [A] is supposed to me,
>>     my v2 will include ctype refactoring.
>
> [D] We need more tests first. :)  I sent patches to test the current
> classifiers separately.  A similar test for isblank would have helped
> you detect the mismatch quickly.  Full test coverage also gives
> confidence when tinkering with the existing classifiers.
>
> 1c149ab2dd (ctype: support iscntrl, ispunct, isxdigit and isprint,
> 2012-10-15) added an implementation of isprint that evaluated its
> argument only once, by the way, but the one from 0fcec2ce54
> (format-patch: make rfc2047 encoding more strict, 2012-10-18)
> replaced it.
>
> Widening the element type of the lookup table would work, but might
> impact performance.  I guess it would be slower, but perhaps we'd
> have to measure it to be sure.  Splitting the table into unrelated
> subsets would avoid that.
>
> Deciding which flags to move requires knowing the full target set,
> I think.  The punctuation-related flags looked to me like good
> candidates until I saw the original isprint implementation which
> uses them and several of the others.  So I'm not so sure anymore,
> but here's a patch that moves them out:

Perhaps:

[E] Implement isblank and isgraph as inline functions and leave the
lookup table integration for later:

   #undef isblank
   #define isblank(x) sane_isblank(x)
   static inline int sane_isblank(int c) {return c == ' ' || c == '\t';}

   #undef isgraph
   #define isgraph(x) sane_isgraph(x)
   static inline int sane_isgraph(int c) {return isprint(c) && c != ' ';}

The Lazy Way™ ;)

René

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-02-11 14:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-10  7:59 [PATCH 0/5] Clean up wildmatch.c Masahiro Yamada
2023-02-10  7:59 ` [PATCH 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
2023-02-10 13:16   ` Ævar Arnfjörð Bjarmason
2023-02-10 16:56     ` Masahiro Yamada
2023-02-10 19:10   ` Junio C Hamano
2023-02-10 19:25     ` Masahiro Yamada
2023-02-10 22:03   ` René Scharfe
2023-02-11  7:01     ` Masahiro Yamada
2023-02-11 13:48       ` René Scharfe
2023-02-11 14:11         ` René Scharfe
2023-02-10  7:59 ` [PATCH 2/5] wildmatch: remove IS*() macros Masahiro Yamada
2023-02-10  7:59 ` [PATCH 3/5] wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros Masahiro Yamada
2023-02-10 13:11   ` Ævar Arnfjörð Bjarmason
2023-02-10 17:03     ` Masahiro Yamada
2023-02-10  7:59 ` [PATCH 4/5] wildmatch: use char instead of uchar Masahiro Yamada
2023-02-10 13:09   ` Ævar Arnfjörð Bjarmason
2023-02-10  7:59 ` [PATCH 5/5] wildmatch: more cleanups after killing uchar Masahiro Yamada

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).