git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/5] Clean up wildmatch.c
@ 2023-02-26 11:50 Masahiro Yamada
  2023-02-26 11:50 ` [PATCH v2 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Masahiro Yamada @ 2023-02-26 11:50 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

This file was imported from rsysc, but it was already largely modified
for GIT. Why not further cleanups? I see a lot of unneeded stubs.


Masahiro Yamada (5):
  git-compat-util: add isblank() and isgraph()
  wildmatch: remove IS*() macros
  wildmatch: remove NEGATE_CLASS(2) macros with trivial refactoring
  wildmatch: use char instead of uchar
  wildmatch: more cleanups after killing uchar

 git-compat-util.h |  14 ++++++
 wildmatch.c       | 108 +++++++++++++---------------------------------
 2 files changed, 45 insertions(+), 77 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-26 11:50 [PATCH v2 0/5] Clean up wildmatch.c Masahiro Yamada
@ 2023-02-26 11:50 ` Masahiro Yamada
  2023-02-26 18:45   ` René Scharfe
  2023-02-26 11:50 ` [PATCH v2 2/5] wildmatch: remove IS*() macros Masahiro Yamada
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Masahiro Yamada @ 2023-02-26 11:50 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

git-compat-util.h implements most of is*() macros.

Add isblank() and isgraph(), which are useful to clean up wildmatch.c
in a consistent way (in this and later commits).

In the previous submission, I just moved isblank() and isgraph() as
implemented in wildmatch.c. I knew they were not robust against the
pointer increment like isblank(*s++), but I thought it was the same
pattern as isprint(), which has the same issue. Unfortunately, it was
more controversial than I had expected...

This version implements them as inline functions because we ran out
all bits in the sane_ctype[] table. This is the same pattern as
islower() and isupper().

Once we refactor ctype.c to create more room in sane_ctype[], isblank()
and isgraph() will be able to use sane_istest(). Probably so will
islower() and isupper(). The ctype in Linux kernel (lib/ctype.c) has
the LOWER and UPPER bits separately.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

Changes in v2:
  - Use inline functions

 git-compat-util.h | 14 ++++++++++++++
 wildmatch.c       | 14 ++------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 4f0028ce60..b29c238f02 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1212,10 +1212,12 @@ extern const unsigned char tolower_trans_tbl[256];
 /* Sane ctype - no locale, and works with signed chars */
 #undef isascii
 #undef isspace
+#undef isblank
 #undef isdigit
 #undef isalpha
 #undef isalnum
 #undef isprint
+#undef isgraph
 #undef islower
 #undef isupper
 #undef tolower
@@ -1236,10 +1238,12 @@ extern const unsigned char sane_ctype[256];
 #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
 #define isascii(x) (((x) & ~0x7f) == 0)
 #define isspace(x) sane_istest(x,GIT_SPACE)
+#define isblank(x) sane_isblank(x)
 #define isdigit(x) sane_istest(x,GIT_DIGIT)
 #define isalpha(x) sane_istest(x,GIT_ALPHA)
 #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
 #define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)
+#define isgraph(x) sane_isgraph(x)
 #define islower(x) sane_iscase(x, 1)
 #define isupper(x) sane_iscase(x, 0)
 #define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL)
@@ -1270,6 +1274,16 @@ static inline int sane_iscase(int x, int is_lower)
 		return (x & 0x20) == 0;
 }
 
+static inline int sane_isblank(int c)
+{
+	return c == ' ' || c == '\t';
+}
+
+static inline int sane_isgraph(int c)
+{
+	return isprint(c) && !isspace(c);
+}
+
 /*
  * Like skip_prefix, but compare case-insensitively. Note that the comparison
  * is done via tolower(), so it is strictly ASCII (no multi-byte characters or
diff --git a/wildmatch.c b/wildmatch.c
index 7e5a7ea1ea..85c4c7f8a7 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -28,18 +28,8 @@ typedef unsigned char uchar;
 # define ISASCII(c) isascii(c)
 #endif
 
-#ifdef isblank
-# define ISBLANK(c) (ISASCII(c) && isblank(c))
-#else
-# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
-#endif
-
-#ifdef isgraph
-# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
-#else
-# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
-#endif
-
+#define ISBLANK(c) (ISASCII(c) && isblank(c))
+#define ISGRAPH(c) (ISASCII(c) && isgraph(c))
 #define ISPRINT(c) (ISASCII(c) && isprint(c))
 #define ISDIGIT(c) (ISASCII(c) && isdigit(c))
 #define ISALNUM(c) (ISASCII(c) && isalnum(c))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/5] wildmatch: remove IS*() macros
  2023-02-26 11:50 [PATCH v2 0/5] Clean up wildmatch.c Masahiro Yamada
  2023-02-26 11:50 ` [PATCH v2 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
@ 2023-02-26 11:50 ` Masahiro Yamada
  2023-02-26 11:50 ` [PATCH v2 3/5] wildmatch: remove NEGATE_CLASS(2) macros with trivial refactoring Masahiro Yamada
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Masahiro Yamada @ 2023-02-26 11:50 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

This file was imported from rsync, which has some compatibility
layer because it relies on <ctypes.h> in C standard library.

In contrast, GIT has its own implementations in git-compat-util.h.

[1] isprint, isgraph, isblank

   They check the given char range in an obvious way

[2] isspace, isdigit, isalpha, isalnum, islower, isupper, iscntr, ispunct

   They look up sane_ctype[], which fills the range 0x80-0xff with 0.

[3] isxdigit

   It looks up hexval_table[], which fills the range 0x80-0xff with -1.

For all of these, ISACII() is a redundant check.

Remove IS*() macros, and directly use is*() in dowild().

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

(no changes since v1)

 wildmatch.c | 55 ++++++++++++++++++-----------------------------------
 1 file changed, 18 insertions(+), 37 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index 85c4c7f8a7..a510b3fd23 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -22,25 +22,6 @@ typedef unsigned char uchar;
 				    && *(class) == *(litmatch) \
 				    && strncmp((char*)class, litmatch, len) == 0)
 
-#if defined STDC_HEADERS || !defined isascii
-# define ISASCII(c) 1
-#else
-# define ISASCII(c) isascii(c)
-#endif
-
-#define ISBLANK(c) (ISASCII(c) && isblank(c))
-#define ISGRAPH(c) (ISASCII(c) && isgraph(c))
-#define ISPRINT(c) (ISASCII(c) && isprint(c))
-#define ISDIGIT(c) (ISASCII(c) && isdigit(c))
-#define ISALNUM(c) (ISASCII(c) && isalnum(c))
-#define ISALPHA(c) (ISASCII(c) && isalpha(c))
-#define ISCNTRL(c) (ISASCII(c) && iscntrl(c))
-#define ISLOWER(c) (ISASCII(c) && islower(c))
-#define ISPUNCT(c) (ISASCII(c) && ispunct(c))
-#define ISSPACE(c) (ISASCII(c) && isspace(c))
-#define ISUPPER(c) (ISASCII(c) && isupper(c))
-#define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
-
 /* Match pattern "p" against "text" */
 static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 {
@@ -52,9 +33,9 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 		uchar t_ch, prev_ch;
 		if ((t_ch = *text) == '\0' && p_ch != '*')
 			return WM_ABORT_ALL;
-		if ((flags & WM_CASEFOLD) && ISUPPER(t_ch))
+		if ((flags & WM_CASEFOLD) && isupper(t_ch))
 			t_ch = tolower(t_ch);
-		if ((flags & WM_CASEFOLD) && ISUPPER(p_ch))
+		if ((flags & WM_CASEFOLD) && isupper(p_ch))
 			p_ch = tolower(p_ch);
 		switch (p_ch) {
 		case '\\':
@@ -133,11 +114,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 				 */
 				if (!is_glob_special(*p)) {
 					p_ch = *p;
-					if ((flags & WM_CASEFOLD) && ISUPPER(p_ch))
+					if ((flags & WM_CASEFOLD) && isupper(p_ch))
 						p_ch = tolower(p_ch);
 					while ((t_ch = *text) != '\0' &&
 					       (match_slash || t_ch != '/')) {
-						if ((flags & WM_CASEFOLD) && ISUPPER(t_ch))
+						if ((flags & WM_CASEFOLD) && isupper(t_ch))
 							t_ch = tolower(t_ch);
 						if (t_ch == p_ch)
 							break;
@@ -186,7 +167,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 					}
 					if (t_ch <= p_ch && t_ch >= prev_ch)
 						matched = 1;
-					else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) {
+					else if ((flags & WM_CASEFOLD) && islower(t_ch)) {
 						uchar t_ch_upper = toupper(t_ch);
 						if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
 							matched = 1;
@@ -208,42 +189,42 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 						continue;
 					}
 					if (CC_EQ(s,i, "alnum")) {
-						if (ISALNUM(t_ch))
+						if (isalnum(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "alpha")) {
-						if (ISALPHA(t_ch))
+						if (isalpha(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "blank")) {
-						if (ISBLANK(t_ch))
+						if (isblank(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "cntrl")) {
-						if (ISCNTRL(t_ch))
+						if (iscntrl(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "digit")) {
-						if (ISDIGIT(t_ch))
+						if (isdigit(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "graph")) {
-						if (ISGRAPH(t_ch))
+						if (isgraph(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "lower")) {
-						if (ISLOWER(t_ch))
+						if (islower(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "print")) {
-						if (ISPRINT(t_ch))
+						if (isprint(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "punct")) {
-						if (ISPUNCT(t_ch))
+						if (ispunct(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "space")) {
-						if (ISSPACE(t_ch))
+						if (isspace(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "upper")) {
-						if (ISUPPER(t_ch))
+						if (isupper(t_ch))
 							matched = 1;
-						else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch))
+						else if ((flags & WM_CASEFOLD) && islower(t_ch))
 							matched = 1;
 					} else if (CC_EQ(s,i, "xdigit")) {
-						if (ISXDIGIT(t_ch))
+						if (isxdigit(t_ch))
 							matched = 1;
 					} else /* malformed [:class:] string */
 						return WM_ABORT_ALL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/5] wildmatch: remove NEGATE_CLASS(2) macros with trivial refactoring
  2023-02-26 11:50 [PATCH v2 0/5] Clean up wildmatch.c Masahiro Yamada
  2023-02-26 11:50 ` [PATCH v2 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
  2023-02-26 11:50 ` [PATCH v2 2/5] wildmatch: remove IS*() macros Masahiro Yamada
@ 2023-02-26 11:50 ` Masahiro Yamada
  2023-02-26 11:50 ` [PATCH v2 4/5] wildmatch: use char instead of uchar Masahiro Yamada
  2023-02-26 11:50 ` [PATCH v2 5/5] wildmatch: more cleanups after killing uchar Masahiro Yamada
  4 siblings, 0 replies; 9+ messages in thread
From: Masahiro Yamada @ 2023-02-26 11:50 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

The other glob patterns are hard-coded in dowild(). There is no need
to macrofy '!' or '^'. Remove the NEGATE_CLASS and REGATE_CLASS2
defines, then refactor the code.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

(no changes since v1)

 wildmatch.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index a510b3fd23..93800b8eac 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -14,10 +14,6 @@
 
 typedef unsigned char uchar;
 
-/* What character marks an inverted character class? */
-#define NEGATE_CLASS	'!'
-#define NEGATE_CLASS2	'^'
-
 #define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
 				    && *(class) == *(litmatch) \
 				    && strncmp((char*)class, litmatch, len) == 0)
@@ -137,12 +133,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 			return WM_ABORT_ALL;
 		case '[':
 			p_ch = *++p;
-#ifdef NEGATE_CLASS2
-			if (p_ch == NEGATE_CLASS2)
-				p_ch = NEGATE_CLASS;
-#endif
 			/* Assign literal 1/0 because of "matched" comparison. */
-			negated = p_ch == NEGATE_CLASS ? 1 : 0;
+			negated = p_ch == '!' || p_ch == '^' ? 1 : 0;
 			if (negated) {
 				/* Inverted character class. */
 				p_ch = *++p;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/5] wildmatch: use char instead of uchar
  2023-02-26 11:50 [PATCH v2 0/5] Clean up wildmatch.c Masahiro Yamada
                   ` (2 preceding siblings ...)
  2023-02-26 11:50 ` [PATCH v2 3/5] wildmatch: remove NEGATE_CLASS(2) macros with trivial refactoring Masahiro Yamada
@ 2023-02-26 11:50 ` Masahiro Yamada
  2023-02-27 20:07   ` Junio C Hamano
  2023-02-26 11:50 ` [PATCH v2 5/5] wildmatch: more cleanups after killing uchar Masahiro Yamada
  4 siblings, 1 reply; 9+ messages in thread
From: Masahiro Yamada @ 2023-02-26 11:50 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

dowild() casts (char *) and (uchar *) back-and-forth, which is
ugly.

This file was imported from rsync, which started to use (unsigned char)
since the following commit:

 | commit e11c42511903adc6d27cf1671cc76fa711ea37e5
 | Author: Wayne Davison <wayned@samba.org>
 | Date:   Sun Jul 6 04:33:54 2003 +0000
 |
 |     - Added [:class:] handling to the character-class code.
 |     - Use explicit unsigned characters for proper set checks.
 |     - Made the character-class code honor backslash escapes.
 |     - Accept '^' as a class-negation character in addition to '!'.

Perhaps, it was needed because rsync relies on is*() from <ctypes.h>.

GIT has its own implementations, so the behavior is clear.

In fact, commit 4546738b58a0 ("Unlocalized isspace and friends")
says one of the motivations is "we want the right signed behaviour".

sane_istest() casts the given character to (unsigned char) anyway
before sane_ctype[] table lookup, so dowild() can use 'char'.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

(no changes since v1)

 wildmatch.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index 93800b8eac..7dffd783cb 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -12,21 +12,19 @@
 #include "cache.h"
 #include "wildmatch.h"
 
-typedef unsigned char uchar;
-
 #define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
 				    && *(class) == *(litmatch) \
-				    && strncmp((char*)class, litmatch, len) == 0)
+				    && strncmp(class, litmatch, len) == 0)
 
 /* Match pattern "p" against "text" */
-static int dowild(const uchar *p, const uchar *text, unsigned int flags)
+static int dowild(const char *p, const char *text, unsigned int flags)
 {
-	uchar p_ch;
-	const uchar *pattern = p;
+	char p_ch;
+	const char *pattern = p;
 
 	for ( ; (p_ch = *p) != '\0'; text++, p++) {
 		int matched, match_slash, negated;
-		uchar t_ch, prev_ch;
+		char t_ch, prev_ch;
 		if ((t_ch = *text) == '\0' && p_ch != '*')
 			return WM_ABORT_ALL;
 		if ((flags & WM_CASEFOLD) && isupper(t_ch))
@@ -50,7 +48,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 			continue;
 		case '*':
 			if (*++p == '*') {
-				const uchar *prev_p = p - 2;
+				const char *prev_p = p - 2;
 				while (*++p == '*') {}
 				if (!(flags & WM_PATHNAME))
 					/* without WM_PATHNAME, '*' == '**' */
@@ -90,10 +88,10 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 				 * with WM_PATHNAME matches the next
 				 * directory
 				 */
-				const char *slash = strchr((char*)text, '/');
+				const char *slash = strchr(text, '/');
 				if (!slash)
 					return WM_NOMATCH;
-				text = (const uchar*)slash;
+				text = slash;
 				/* the slash is consumed by the top-level for loop */
 				break;
 			}
@@ -160,13 +158,13 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 					if (t_ch <= p_ch && t_ch >= prev_ch)
 						matched = 1;
 					else if ((flags & WM_CASEFOLD) && islower(t_ch)) {
-						uchar t_ch_upper = toupper(t_ch);
+						char t_ch_upper = toupper(t_ch);
 						if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
 							matched = 1;
 					}
 					p_ch = 0; /* This makes "prev_ch" get set to 0. */
 				} else if (p_ch == '[' && p[1] == ':') {
-					const uchar *s;
+					const char *s;
 					int i;
 					for (s = p += 2; (p_ch = *p) && p_ch != ']'; p++) {} /*SHARED ITERATOR*/
 					if (!p_ch)
@@ -237,5 +235,5 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 /* Match the "pattern" against the "text" string. */
 int wildmatch(const char *pattern, const char *text, unsigned int flags)
 {
-	return dowild((const uchar*)pattern, (const uchar*)text, flags);
+	return dowild(pattern, text, flags);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 5/5] wildmatch: more cleanups after killing uchar
  2023-02-26 11:50 [PATCH v2 0/5] Clean up wildmatch.c Masahiro Yamada
                   ` (3 preceding siblings ...)
  2023-02-26 11:50 ` [PATCH v2 4/5] wildmatch: use char instead of uchar Masahiro Yamada
@ 2023-02-26 11:50 ` Masahiro Yamada
  4 siblings, 0 replies; 9+ messages in thread
From: Masahiro Yamada @ 2023-02-26 11:50 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy, Masahiro Yamada

Remove the local function dowild(), which is now equivalent to
wildmatch().

Remove the local variable, slash.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

(no changes since v1)

 wildmatch.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index 7dffd783cb..24577e9b8e 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -17,7 +17,7 @@
 				    && strncmp(class, litmatch, len) == 0)
 
 /* Match pattern "p" against "text" */
-static int dowild(const char *p, const char *text, unsigned int flags)
+int wildmatch(const char *p, const char *text, unsigned int flags)
 {
 	char p_ch;
 	const char *pattern = p;
@@ -66,7 +66,7 @@ static int dowild(const char *p, const char *text, unsigned int flags)
 					 * both foo/bar and foo/a/bar.
 					 */
 					if (p[0] == '/' &&
-					    dowild(p + 1, text, flags) == WM_MATCH)
+					    wildmatch(p + 1, text, flags) == WM_MATCH)
 						return WM_MATCH;
 					match_slash = 1;
 				} else /* WM_PATHNAME is set */
@@ -88,10 +88,9 @@ static int dowild(const char *p, const char *text, unsigned int flags)
 				 * with WM_PATHNAME matches the next
 				 * directory
 				 */
-				const char *slash = strchr(text, '/');
-				if (!slash)
+				text = strchr(text, '/');
+				if (!text)
 					return WM_NOMATCH;
-				text = slash;
 				/* the slash is consumed by the top-level for loop */
 				break;
 			}
@@ -121,7 +120,7 @@ static int dowild(const char *p, const char *text, unsigned int flags)
 					if (t_ch != p_ch)
 						return WM_NOMATCH;
 				}
-				if ((matched = dowild(p, text, flags)) != WM_NOMATCH) {
+				if ((matched = wildmatch(p, text, flags)) != WM_NOMATCH) {
 					if (!match_slash || matched != WM_ABORT_TO_STARSTAR)
 						return matched;
 				} else if (!match_slash && t_ch == '/')
@@ -231,9 +230,3 @@ static int dowild(const char *p, const char *text, unsigned int flags)
 
 	return *text ? WM_NOMATCH : WM_MATCH;
 }
-
-/* Match the "pattern" against the "text" string. */
-int wildmatch(const char *pattern, const char *text, unsigned int flags)
-{
-	return dowild(pattern, text, flags);
-}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-26 11:50 ` [PATCH v2 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
@ 2023-02-26 18:45   ` René Scharfe
  2023-02-27 19:39     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: René Scharfe @ 2023-02-26 18:45 UTC (permalink / raw)
  To: Masahiro Yamada, git; +Cc: Nguyễn Thái Ngọc Duy

Am 26.02.23 um 12:50 schrieb Masahiro Yamada:
> git-compat-util.h implements most of is*() macros.
>
> Add isblank() and isgraph(), which are useful to clean up wildmatch.c
> in a consistent way (in this and later commits).
>
> In the previous submission, I just moved isblank() and isgraph() as
> implemented in wildmatch.c. I knew they were not robust against the
> pointer increment like isblank(*s++), but I thought it was the same
> pattern as isprint(), which has the same issue. Unfortunately, it was
> more controversial than I had expected...

Not sure we need that story in the commit message, but it gave me an
idea: To go back to the isprint() version from 1c149ab2dd (ctype:
support iscntrl, ispunct, isxdigit and isprint, 2012-10-15), which
evaluates its argument only once:

#define isprint(x) (sane_istest(x, GIT_ALPHA | GIT_DIGIT | GIT_SPACE | \
		GIT_PUNCT | GIT_REGEX_SPECIAL | GIT_GLOB_SPECIAL | \
		GIT_PATHSPEC_MAGIC))

But then I realized that it wrongly classifies \t, \r and \n as
being printing characters; 567342fc77 (test-ctype: test iscntrl,
ispunct, isxdigit and isprint, 2023-02-13) shows it.  So it's not so
easy, however, ...

> This version implements them as inline functions because we ran out
> all bits in the sane_ctype[] table. This is the same pattern as
> islower() and isupper().

... if you remove GIT_SPACE from the definition above you get a
macro version of isgraph() that uses a single table lookup.

> Once we refactor ctype.c to create more room in sane_ctype[], isblank()
> and isgraph() will be able to use sane_istest(). Probably so will
> islower() and isupper(). The ctype in Linux kernel (lib/ctype.c) has
> the LOWER and UPPER bits separately.

If we're out of bits then isblank() is a good choice to implement
without a table lookup, as this class only contains two characters
and two comparisons should be quite fast.

Stepping back a bit: Is using the unlocalized is* macros in
wildmatch() safe, i.e. do we get the same results as before
regardless of locale?  Junio's remark in
https://lore.kernel.org/git/xmqq3579crsd.fsf@gitster.g/ sounds
convincing to me if we don't care about single-byte code pages
and require plain ASCII or UTF-8.  I think it's a good idea to
address that point in the commit message.

And adding tests to t/helper/test-ctype.c would be nice.

> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
> Changes in v2:
>   - Use inline functions
>
>  git-compat-util.h | 14 ++++++++++++++
>  wildmatch.c       | 14 ++------------
>  2 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 4f0028ce60..b29c238f02 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1212,10 +1212,12 @@ extern const unsigned char tolower_trans_tbl[256];
>  /* Sane ctype - no locale, and works with signed chars */
>  #undef isascii
>  #undef isspace
> +#undef isblank
>  #undef isdigit
>  #undef isalpha
>  #undef isalnum
>  #undef isprint
> +#undef isgraph
>  #undef islower
>  #undef isupper
>  #undef tolower
> @@ -1236,10 +1238,12 @@ extern const unsigned char sane_ctype[256];
>  #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
>  #define isascii(x) (((x) & ~0x7f) == 0)
>  #define isspace(x) sane_istest(x,GIT_SPACE)
> +#define isblank(x) sane_isblank(x)
>  #define isdigit(x) sane_istest(x,GIT_DIGIT)
>  #define isalpha(x) sane_istest(x,GIT_ALPHA)
>  #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
>  #define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)
> +#define isgraph(x) sane_isgraph(x)
>  #define islower(x) sane_iscase(x, 1)
>  #define isupper(x) sane_iscase(x, 0)
>  #define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL)
> @@ -1270,6 +1274,16 @@ static inline int sane_iscase(int x, int is_lower)
>  		return (x & 0x20) == 0;
>  }
>
> +static inline int sane_isblank(int c)
> +{
> +	return c == ' ' || c == '\t';
> +}
> +
> +static inline int sane_isgraph(int c)
> +{
> +	return isprint(c) && !isspace(c);
> +}
> +
>  /*
>   * Like skip_prefix, but compare case-insensitively. Note that the comparison
>   * is done via tolower(), so it is strictly ASCII (no multi-byte characters or
> diff --git a/wildmatch.c b/wildmatch.c
> index 7e5a7ea1ea..85c4c7f8a7 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -28,18 +28,8 @@ typedef unsigned char uchar;
>  # define ISASCII(c) isascii(c)
>  #endif
>
> -#ifdef isblank
> -# define ISBLANK(c) (ISASCII(c) && isblank(c))
> -#else
> -# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
> -#endif
> -
> -#ifdef isgraph
> -# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
> -#else
> -# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
> -#endif
> -
> +#define ISBLANK(c) (ISASCII(c) && isblank(c))
> +#define ISGRAPH(c) (ISASCII(c) && isgraph(c))
>  #define ISPRINT(c) (ISASCII(c) && isprint(c))
>  #define ISDIGIT(c) (ISASCII(c) && isdigit(c))
>  #define ISALNUM(c) (ISASCII(c) && isalnum(c))


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/5] git-compat-util: add isblank() and isgraph()
  2023-02-26 18:45   ` René Scharfe
@ 2023-02-27 19:39     ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2023-02-27 19:39 UTC (permalink / raw)
  To: René Scharfe
  Cc: Masahiro Yamada, git, Nguyễn Thái Ngọc Duy

René Scharfe <l.s.r@web.de> writes:

>> In the previous submission, I just moved isblank() and isgraph() as
>> implemented in wildmatch.c. I knew they were not robust against the
>> pointer increment like isblank(*s++), but I thought it was the same
>> pattern as isprint(), which has the same issue. Unfortunately, it was
>> more controversial than I had expected...
>
> Not sure we need that story in the commit message,...

Usually we encourage people to write as if there were no "previous
iterations".  Describe how to reach the best end-result without any
detour, using the experience gained from failed attempts.

> ... but it gave me an idea:

;-)

> ...  So it's not so
> easy, however, ...
>
>> This version implements them as inline functions because we ran out
>> all bits in the sane_ctype[] table. This is the same pattern as
>> islower() and isupper().
>
> ... if you remove GIT_SPACE from the definition above you get a
> macro version of isgraph() that uses a single table lookup.
>
> If we're out of bits then isblank() is a good choice to implement
> without a table lookup, as this class only contains two characters
> and two comparisons should be quite fast.

Implementing sane_isblank() plus sane_isgraph() as static inlines is
already not too bad, but if one of them can be made into a simple
table look-up, that indeed is better ;-)

> Stepping back a bit: Is using the unlocalized is* macros in
> wildmatch() safe, i.e. do we get the same results as before
> regardless of locale?  Junio's remark in
> https://lore.kernel.org/git/xmqq3579crsd.fsf@gitster.g/ sounds
> convincing to me if we don't care about single-byte code pages
> and require plain ASCII or UTF-8.  I think it's a good idea to
> address that point in the commit message.

True.  To be honest, I wasn't thinking about non-UTF-8 single-byte
"code pages".  In the context of wildmatch, the strings we deal with
mostly interact with the paths on the filesystem.  If you interact
with your filesystem in latin-1 that may pose an issue, and even if
(or rather, especially if) we are to declare that it does not matter,
we should document it in the proposed log message.
>
> And adding tests to t/helper/test-ctype.c would be nice.

Very true.

>> +#define isblank(x) sane_isblank(x)
>> +#define isgraph(x) sane_isgraph(x)
>> @@ -1270,6 +1274,16 @@ static inline int sane_iscase(int x, int is_lower)
>>  		return (x & 0x20) == 0;
>>  }
>>
>> +static inline int sane_isblank(int c)
>> +{
>> +	return c == ' ' || c == '\t';
>> +}
>> +
>> +static inline int sane_isgraph(int c)
>> +{
>> +	return isprint(c) && !isspace(c);
>> +}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 4/5] wildmatch: use char instead of uchar
  2023-02-26 11:50 ` [PATCH v2 4/5] wildmatch: use char instead of uchar Masahiro Yamada
@ 2023-02-27 20:07   ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2023-02-27 20:07 UTC (permalink / raw)
  To: Masahiro Yamada; +Cc: git, Nguyễn Thái Ngọc Duy

Masahiro Yamada <masahiroy@kernel.org> writes:

> GIT has its own implementations, so the behavior is clear.
>
> In fact, commit 4546738b58a0 ("Unlocalized isspace and friends")
> says one of the motivations is "we want the right signed behaviour".
>
> sane_istest() casts the given character to (unsigned char) anyway
> before sane_ctype[] table lookup, so dowild() can use 'char'.

Use of values taken from a char/uchar in sane_istest() is designed
to be safe, and between using char*/uchar* to scan pieces of memory
would not make much difference, so changes like ...

>  		case '*':
>  			if (*++p == '*') {
> -				const uchar *prev_p = p - 2;
> +				const char *prev_p = p - 2;

... this is safe, and so is ...

> -				const char *slash = strchr((char*)text, '/');
> +				const char *slash = strchr(text, '/');

... this.

But does the comparison between t_ch_upper and prev_ch behave the
same with and without this patch?

> -						uchar t_ch_upper = toupper(t_ch);
> +						char t_ch_upper = toupper(t_ch);
>  						if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)

Here t_ch is already known to pass islower(), so t_ch_upper would be
within a reasonable range and "char" should be able to store it
safely without its value wrapping around to negative.  Do we have a
similar guarantee on prev_ch, or can it be more than 128, which
would have caused the original to say "t_ch_upper is smaller than
prev_ch" but now because prev_ch can wrap around to a negative
value, leading the conditional to behave differently, or something?

>  							matched = 1;
>  					}
>  					p_ch = 0; /* This makes "prev_ch" get set to 0. */
>  				} else if (p_ch == '[' && p[1] == ':') {
> -					const uchar *s;
> +					const char *s;
>  					int i;
>  					for (s = p += 2; (p_ch = *p) && p_ch != ']'; p++) {} /*SHARED ITERATOR*/
>  					if (!p_ch)
> @@ -237,5 +235,5 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  /* Match the "pattern" against the "text" string. */
>  int wildmatch(const char *pattern, const char *text, unsigned int flags)
>  {
> -	return dowild((const uchar*)pattern, (const uchar*)text, flags);
> +	return dowild(pattern, text, flags);
>  }

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-02-27 20:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-26 11:50 [PATCH v2 0/5] Clean up wildmatch.c Masahiro Yamada
2023-02-26 11:50 ` [PATCH v2 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
2023-02-26 18:45   ` René Scharfe
2023-02-27 19:39     ` Junio C Hamano
2023-02-26 11:50 ` [PATCH v2 2/5] wildmatch: remove IS*() macros Masahiro Yamada
2023-02-26 11:50 ` [PATCH v2 3/5] wildmatch: remove NEGATE_CLASS(2) macros with trivial refactoring Masahiro Yamada
2023-02-26 11:50 ` [PATCH v2 4/5] wildmatch: use char instead of uchar Masahiro Yamada
2023-02-27 20:07   ` Junio C Hamano
2023-02-26 11:50 ` [PATCH v2 5/5] wildmatch: more cleanups after killing uchar Masahiro Yamada

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).