git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] test-ctype: test all classifiers
@ 2023-02-11 13:04 René Scharfe
  2023-02-11 13:12 ` [PATCH 1/3] test-ctype: test isascii René Scharfe
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: René Scharfe @ 2023-02-11 13:04 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Masahiro Yamada

Add tests to cover the remaining character class test functions.

  test-ctype: test isascii
  test-ctype: test islower and isupper
  test-ctype: test iscntrl, ispunct, isxdigit and isprint

 t/helper/test-ctype.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

--
2.39.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] test-ctype: test isascii
  2023-02-11 13:04 [PATCH 0/3] test-ctype: test all classifiers René Scharfe
@ 2023-02-11 13:12 ` René Scharfe
  2023-02-11 19:48   ` Junio C Hamano
  2023-02-11 13:12 ` [PATCH 2/3] test-ctype: test islower and isupper René Scharfe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: René Scharfe @ 2023-02-11 13:12 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Masahiro Yamada

Test the character classifier added by c2e9364a06 (cleanup: add
isascii(), 2009-03-07).  It returns 1 for NUL as well, which requires
special treatment, as our string-based tester can't find it with
strcmp(3).  Allow NUL to be given as the first character in a class
specification string.  This has the downside of no longer supporting
the empty string, but that's OK since we are not interested in testing
character classes with no members.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/helper/test-ctype.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
index 92c4c2313e..caf586649f 100644
--- a/t/helper/test-ctype.c
+++ b/t/helper/test-ctype.c
@@ -11,9 +11,14 @@ static void report_error(const char *class, int ch)

 static int is_in(const char *s, int ch)
 {
-	/* We can't find NUL using strchr.  It's classless anyway. */
+	/*
+	 * We can't find NUL using strchr. Accept it as the first
+	 * character in the spec -- there are no empty classes.
+	 */
 	if (ch == '\0')
-		return 0;
+		return ch == *s;
+	if (*s == '\0')
+		s++;
 	return !!strchr(s, ch);
 }

@@ -28,6 +33,15 @@ static int is_in(const char *s, int ch)
 #define DIGIT "0123456789"
 #define LOWER "abcdefghijklmnopqrstuvwxyz"
 #define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+#define ASCII \
+	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
+	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
+	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
+	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
+	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
+	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"

 int cmd__ctype(int argc, const char **argv)
 {
@@ -38,6 +52,7 @@ int cmd__ctype(int argc, const char **argv)
 	TEST_CLASS(is_glob_special, "*?[\\");
 	TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
 	TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
+	TEST_CLASS(isascii, ASCII);

 	return rc;
 }
--
2.39.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] test-ctype: test islower and isupper
  2023-02-11 13:04 [PATCH 0/3] test-ctype: test all classifiers René Scharfe
  2023-02-11 13:12 ` [PATCH 1/3] test-ctype: test isascii René Scharfe
@ 2023-02-11 13:12 ` René Scharfe
  2023-02-11 13:12 ` [PATCH 3/3] test-ctype: test iscntrl, ispunct, isxdigit and isprint René Scharfe
  2023-02-13 21:08 ` [PATCH v2 0/3] test-ctype: test all classifiers René Scharfe
  3 siblings, 0 replies; 14+ messages in thread
From: René Scharfe @ 2023-02-11 13:12 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Masahiro Yamada

Test the character classifiers added by 43ccdf56ec (ctype: implement
islower/isupper macro, 2012-02-10).

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/helper/test-ctype.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
index caf586649f..8ac76e93e4 100644
--- a/t/helper/test-ctype.c
+++ b/t/helper/test-ctype.c
@@ -53,6 +53,8 @@ int cmd__ctype(int argc, const char **argv)
 	TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
 	TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
 	TEST_CLASS(isascii, ASCII);
+	TEST_CLASS(islower, LOWER);
+	TEST_CLASS(isupper, UPPER);

 	return rc;
 }
--
2.39.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] test-ctype: test iscntrl, ispunct, isxdigit and isprint
  2023-02-11 13:04 [PATCH 0/3] test-ctype: test all classifiers René Scharfe
  2023-02-11 13:12 ` [PATCH 1/3] test-ctype: test isascii René Scharfe
  2023-02-11 13:12 ` [PATCH 2/3] test-ctype: test islower and isupper René Scharfe
@ 2023-02-11 13:12 ` René Scharfe
  2023-02-13 21:08 ` [PATCH v2 0/3] test-ctype: test all classifiers René Scharfe
  3 siblings, 0 replies; 14+ messages in thread
From: René Scharfe @ 2023-02-11 13:12 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Masahiro Yamada

Test the character classifiers added by 1c149ab2dd (ctype: support
iscntrl, ispunct, isxdigit and isprint, 2012-10-15) and 0fcec2ce54
(format-patch: make rfc2047 encoding more strict, 2012-10-18).

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/helper/test-ctype.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
index 8ac76e93e4..d04da0d465 100644
--- a/t/helper/test-ctype.c
+++ b/t/helper/test-ctype.c
@@ -42,6 +42,17 @@ static int is_in(const char *s, int ch)
 	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
 	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
 	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+#define CNTRL \
+	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+	"\x7f"
+#define PRINT \
+	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
+	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
+	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
+	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
+	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
+	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e"

 int cmd__ctype(int argc, const char **argv)
 {
@@ -55,6 +66,10 @@ int cmd__ctype(int argc, const char **argv)
 	TEST_CLASS(isascii, ASCII);
 	TEST_CLASS(islower, LOWER);
 	TEST_CLASS(isupper, UPPER);
+	TEST_CLASS(iscntrl, CNTRL);
+	TEST_CLASS(ispunct, "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~");
+	TEST_CLASS(isxdigit, DIGIT "abcdefABCDEF");
+	TEST_CLASS(isprint, PRINT);

 	return rc;
 }
--
2.39.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] test-ctype: test isascii
  2023-02-11 13:12 ` [PATCH 1/3] test-ctype: test isascii René Scharfe
@ 2023-02-11 19:48   ` Junio C Hamano
  2023-02-12  9:48     ` René Scharfe
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2023-02-11 19:48 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Masahiro Yamada

René Scharfe <l.s.r@web.de> writes:

> Test the character classifier added by c2e9364a06 (cleanup: add
> isascii(), 2009-03-07).  It returns 1 for NUL as well, which requires
> special treatment, as our string-based tester can't find it with
> strcmp(3).  Allow NUL to be given as the first character in a class
> specification string.  This has the downside of no longer supporting
> the empty string, but that's OK since we are not interested in testing
> character classes with no members.

I wonder how effective a test we can have by checking a table we use
in production (i.e. ctype.c::sane_ctype[]) against another table we
use only for testing (i.e. string literals in test-ctype.c), but
that is not something new in this series.

I do not offhand know if the string literal prefixed with NUL is
safe against clever compilers; my gut feeling says it should
(i.e. allowing such an "optimization" does not seem to have much
merit), but my gut has been wrong many times in this area, so...

>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  t/helper/test-ctype.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
> index 92c4c2313e..caf586649f 100644
> --- a/t/helper/test-ctype.c
> +++ b/t/helper/test-ctype.c
> @@ -11,9 +11,14 @@ static void report_error(const char *class, int ch)
>
>  static int is_in(const char *s, int ch)
>  {
> -	/* We can't find NUL using strchr.  It's classless anyway. */
> +	/*
> +	 * We can't find NUL using strchr. Accept it as the first
> +	 * character in the spec -- there are no empty classes.
> +	 */
>  	if (ch == '\0')
> -		return 0;
> +		return ch == *s;
> +	if (*s == '\0')
> +		s++;
>  	return !!strchr(s, ch);
>  }
>
> @@ -28,6 +33,15 @@ static int is_in(const char *s, int ch)
>  #define DIGIT "0123456789"
>  #define LOWER "abcdefghijklmnopqrstuvwxyz"
>  #define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> +#define ASCII \
> +	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
> +	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
> +	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
> +	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
> +	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
> +	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
> +	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
> +	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
>
>  int cmd__ctype(int argc, const char **argv)
>  {
> @@ -38,6 +52,7 @@ int cmd__ctype(int argc, const char **argv)
>  	TEST_CLASS(is_glob_special, "*?[\\");
>  	TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
>  	TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
> +	TEST_CLASS(isascii, ASCII);
>
>  	return rc;
>  }
> --
> 2.39.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] test-ctype: test isascii
  2023-02-11 19:48   ` Junio C Hamano
@ 2023-02-12  9:48     ` René Scharfe
  2023-02-13  3:39       ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: René Scharfe @ 2023-02-12  9:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Masahiro Yamada

Am 11.02.23 um 20:48 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Test the character classifier added by c2e9364a06 (cleanup: add
>> isascii(), 2009-03-07).  It returns 1 for NUL as well, which requires
>> special treatment, as our string-based tester can't find it with
>> strcmp(3).  Allow NUL to be given as the first character in a class
>> specification string.  This has the downside of no longer supporting
>> the empty string, but that's OK since we are not interested in testing
>> character classes with no members.
>
> I wonder how effective a test we can have by checking a table we use
> in production (i.e. ctype.c::sane_ctype[]) against another table we
> use only for testing (i.e. string literals in test-ctype.c), but
> that is not something new in this series.

What aspect is left uncovered?

Or do you mean that the production table should be made trivially
readable to avoid having to test at all?

I on the other hand wonder if we really should add more and more
locale-ignoring classifiers.  Parsing object headers and such sure
require that stability, but parsing commit messages and blob
payloads should perhaps better be done with locale-aware versions
with multi-byte character support.

> I do not offhand know if the string literal prefixed with NUL is
> safe against clever compilers; my gut feeling says it should
> (i.e. allowing such an "optimization" does not seem to have much
> merit), but my gut has been wrong many times in this area, so...

Some compilers do despicable things in the name of optimization, but I
don't see the basis for truncating a string literal at the first NUL.

C99 standard section 6.4.5 (String literals) paragraph 5 has a footnote
that says: "A character string literal need not be a string (see 7.1.1),
because a null character may be embedded in it by a \0 escape sequence."
and 7.1.1 defines: "A string is a contiguous sequence of characters
terminated by and including the first null character."

René

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] test-ctype: test isascii
  2023-02-12  9:48     ` René Scharfe
@ 2023-02-13  3:39       ` Junio C Hamano
  2023-02-13 18:37         ` René Scharfe
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2023-02-13  3:39 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Masahiro Yamada

René Scharfe <l.s.r@web.de> writes:

> Am 11.02.23 um 20:48 schrieb Junio C Hamano:
>> René Scharfe <l.s.r@web.de> writes:
>>
>>> Test the character classifier added by c2e9364a06 (cleanup: add
>>> isascii(), 2009-03-07).  It returns 1 for NUL as well, which requires
>>> special treatment, as our string-based tester can't find it with
>>> strcmp(3).  Allow NUL to be given as the first character in a class
>>> specification string.  This has the downside of no longer supporting
>>> the empty string, but that's OK since we are not interested in testing
>>> character classes with no members.
>>
>> I wonder how effective a test we can have by checking a table we use
>> in production (i.e. ctype.c::sane_ctype[]) against another table we
>> use only for testing (i.e. string literals in test-ctype.c), but
>> that is not something new in this series.
>
> What aspect is left uncovered?
>
> Or do you mean that the production table should be made trivially
> readable to avoid having to test at all?

Much closer to the latter but not quite.

Both tables are not all that transparent, and it feels that the
protection by the tests largely depends on the fact that it is less
likely for us to make the same mistake in two "not so crystal clear"
tables for the same byte.

> ... but parsing commit messages and blob
> payloads should perhaps better be done with locale-aware versions
> with multi-byte character support.

Yes, that does make sense but it is orthogonal to what sane_ctype
wants to address, I would think.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] test-ctype: test isascii
  2023-02-13  3:39       ` Junio C Hamano
@ 2023-02-13 18:37         ` René Scharfe
  2023-02-13 19:02           ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: René Scharfe @ 2023-02-13 18:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Masahiro Yamada

Am 13.02.23 um 04:39 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Am 11.02.23 um 20:48 schrieb Junio C Hamano:
>>> René Scharfe <l.s.r@web.de> writes:
>>>
>>>> Test the character classifier added by c2e9364a06 (cleanup: add
>>>> isascii(), 2009-03-07).  It returns 1 for NUL as well, which requires
>>>> special treatment, as our string-based tester can't find it with
>>>> strcmp(3).  Allow NUL to be given as the first character in a class
>>>> specification string.  This has the downside of no longer supporting
>>>> the empty string, but that's OK since we are not interested in testing
>>>> character classes with no members.
>>>
>>> I wonder how effective a test we can have by checking a table we use
>>> in production (i.e. ctype.c::sane_ctype[]) against another table we
>>> use only for testing (i.e. string literals in test-ctype.c), but
>>> that is not something new in this series.
>>
>> What aspect is left uncovered?
>>
>> Or do you mean that the production table should be made trivially
>> readable to avoid having to test at all?
>
> Much closer to the latter but not quite.
>
> Both tables are not all that transparent, and it feels that the
> protection by the tests largely depends on the fact that it is less
> likely for us to make the same mistake in two "not so crystal clear"
> tables for the same byte.

The test strings for islower() and isupper() I wrote down from memory
long ago, I think.  They should be easily verifiable, like the new ones
for isxdigit().  The one for isascii() is a bit tiring, but verifying it
against the man page which says that the characters from value 0 to 0177
are included should be feasible.  The ones for iscntrl() and ispunct() I
got from their man page.

But when it came to isprint() I got lazy and just copied from ctype.c --
you got me there.  A more intuitive representation could be:

   " " LOWER UPPER DIGIT PUNCT

In my experience having two copies already helps when modifying one of
them -- but at least at some point we better check them against an
external source of truth.

The ctype.c version needs to be fast, so we probably have to make some
concessions to readability.  I'd love to be proven wrong on that,
though.

>> ... but parsing commit messages and blob
>> payloads should perhaps better be done with locale-aware versions
>> with multi-byte character support.
>
> Yes, that does make sense but it is orthogonal to what sane_ctype
> wants to address, I would think.

Currently we can only use one or the other variant because our sane
versions use the same names as the locale-aware ones.  Full overlap
instead of orthogonality.  I don't know if there is a practical impact
besides not recognizing function lines that start with umlauts etc.
for diff hunk headers, though.

René

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] test-ctype: test isascii
  2023-02-13 18:37         ` René Scharfe
@ 2023-02-13 19:02           ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2023-02-13 19:02 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Masahiro Yamada

René Scharfe <l.s.r@web.de> writes:

>> Yes, that does make sense but it is orthogonal to what sane_ctype
>> wants to address, I would think.
>
> Currently we can only use one or the other variant because our sane
> versions use the same names as the locale-aware ones.  Full overlap
> instead of orthogonality.

Ah, that is true but slightly complicated.  As long as the caller of
walks the string byte-by-byte and calls ispunct() in each iteration,
I do not think any "locale aware ispunct()" do much good to us.

Once callers are made locale aware, I am not sure they would still
want to call the isfoo() functions that have been know to be very
much byte-oriented.

Thanks.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 0/3] test-ctype: test all classifiers
  2023-02-11 13:04 [PATCH 0/3] test-ctype: test all classifiers René Scharfe
                   ` (2 preceding siblings ...)
  2023-02-11 13:12 ` [PATCH 3/3] test-ctype: test iscntrl, ispunct, isxdigit and isprint René Scharfe
@ 2023-02-13 21:08 ` René Scharfe
  2023-02-13 21:09   ` [PATCH v2 1/3] test-ctype: test isascii René Scharfe
                     ` (3 more replies)
  3 siblings, 4 replies; 14+ messages in thread
From: René Scharfe @ 2023-02-13 21:08 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Masahiro Yamada

Add tests to cover the remaining character class test functions.

Changes since v1:
- More compact and readable isprint() test string specification by
  reusing the one for ispunct().

  test-ctype: test isascii
  test-ctype: test islower and isupper
  test-ctype: test iscntrl, ispunct, isxdigit and isprint

 t/helper/test-ctype.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

Interdiff against v1:
diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
index d04da0d465..b21bd672d9 100644
--- a/t/helper/test-ctype.c
+++ b/t/helper/test-ctype.c
@@ -33,6 +33,7 @@ static int is_in(const char *s, int ch)
 #define DIGIT "0123456789"
 #define LOWER "abcdefghijklmnopqrstuvwxyz"
 #define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
 #define ASCII \
 	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
 	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
@@ -46,13 +47,6 @@ static int is_in(const char *s, int ch)
 	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
 	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
 	"\x7f"
-#define PRINT \
-	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
-	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
-	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
-	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
-	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
-	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e"

 int cmd__ctype(int argc, const char **argv)
 {
@@ -67,9 +61,9 @@ int cmd__ctype(int argc, const char **argv)
 	TEST_CLASS(islower, LOWER);
 	TEST_CLASS(isupper, UPPER);
 	TEST_CLASS(iscntrl, CNTRL);
-	TEST_CLASS(ispunct, "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~");
+	TEST_CLASS(ispunct, PUNCT);
 	TEST_CLASS(isxdigit, DIGIT "abcdefABCDEF");
-	TEST_CLASS(isprint, PRINT);
+	TEST_CLASS(isprint, LOWER UPPER DIGIT PUNCT " ");

 	return rc;
 }

Range-diff against v1:
1:  f8a7889fb4 = 1:  f8a7889fb4 test-ctype: test isascii
2:  1e0c313b38 = 2:  1e0c313b38 test-ctype: test islower and isupper
3:  dcaffe6917 < -:  ---------- test-ctype: test iscntrl, ispunct, isxdigit and isprint
-:  ---------- > 3:  cb50e581d4 test-ctype: test iscntrl, ispunct, isxdigit and isprint

--
2.39.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 1/3] test-ctype: test isascii
  2023-02-13 21:08 ` [PATCH v2 0/3] test-ctype: test all classifiers René Scharfe
@ 2023-02-13 21:09   ` René Scharfe
  2023-02-13 21:10   ` [PATCH v2 2/3] test-ctype: test islower and isupper René Scharfe
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: René Scharfe @ 2023-02-13 21:09 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Masahiro Yamada

Test the character classifier added by c2e9364a06 (cleanup: add
isascii(), 2009-03-07).  It returns 1 for NUL as well, which requires
special treatment, as our string-based tester can't find it with
strcmp(3).  Allow NUL to be given as the first character in a class
specification string.  This has the downside of no longer supporting
the empty string, but that's OK since we are not interested in testing
character classes with no members.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/helper/test-ctype.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
index 92c4c2313e..caf586649f 100644
--- a/t/helper/test-ctype.c
+++ b/t/helper/test-ctype.c
@@ -11,9 +11,14 @@ static void report_error(const char *class, int ch)

 static int is_in(const char *s, int ch)
 {
-	/* We can't find NUL using strchr.  It's classless anyway. */
+	/*
+	 * We can't find NUL using strchr. Accept it as the first
+	 * character in the spec -- there are no empty classes.
+	 */
 	if (ch == '\0')
-		return 0;
+		return ch == *s;
+	if (*s == '\0')
+		s++;
 	return !!strchr(s, ch);
 }

@@ -28,6 +33,15 @@ static int is_in(const char *s, int ch)
 #define DIGIT "0123456789"
 #define LOWER "abcdefghijklmnopqrstuvwxyz"
 #define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+#define ASCII \
+	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
+	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
+	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
+	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
+	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
+	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"

 int cmd__ctype(int argc, const char **argv)
 {
@@ -38,6 +52,7 @@ int cmd__ctype(int argc, const char **argv)
 	TEST_CLASS(is_glob_special, "*?[\\");
 	TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
 	TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
+	TEST_CLASS(isascii, ASCII);

 	return rc;
 }
--
2.39.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/3] test-ctype: test islower and isupper
  2023-02-13 21:08 ` [PATCH v2 0/3] test-ctype: test all classifiers René Scharfe
  2023-02-13 21:09   ` [PATCH v2 1/3] test-ctype: test isascii René Scharfe
@ 2023-02-13 21:10   ` René Scharfe
  2023-02-13 21:12   ` [PATCH v2 3/3] test-ctype: test iscntrl, ispunct, isxdigit and isprint René Scharfe
  2023-02-13 21:59   ` [PATCH v2 0/3] test-ctype: test all classifiers Junio C Hamano
  3 siblings, 0 replies; 14+ messages in thread
From: René Scharfe @ 2023-02-13 21:10 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Masahiro Yamada

Test the character classifiers added by 43ccdf56ec (ctype: implement
islower/isupper macro, 2012-02-10).

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/helper/test-ctype.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
index caf586649f..8ac76e93e4 100644
--- a/t/helper/test-ctype.c
+++ b/t/helper/test-ctype.c
@@ -53,6 +53,8 @@ int cmd__ctype(int argc, const char **argv)
 	TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
 	TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
 	TEST_CLASS(isascii, ASCII);
+	TEST_CLASS(islower, LOWER);
+	TEST_CLASS(isupper, UPPER);

 	return rc;
 }
--
2.39.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/3] test-ctype: test iscntrl, ispunct, isxdigit and isprint
  2023-02-13 21:08 ` [PATCH v2 0/3] test-ctype: test all classifiers René Scharfe
  2023-02-13 21:09   ` [PATCH v2 1/3] test-ctype: test isascii René Scharfe
  2023-02-13 21:10   ` [PATCH v2 2/3] test-ctype: test islower and isupper René Scharfe
@ 2023-02-13 21:12   ` René Scharfe
  2023-02-13 21:59   ` [PATCH v2 0/3] test-ctype: test all classifiers Junio C Hamano
  3 siblings, 0 replies; 14+ messages in thread
From: René Scharfe @ 2023-02-13 21:12 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Masahiro Yamada

Test the character classifiers added by 1c149ab2dd (ctype: support
iscntrl, ispunct, isxdigit and isprint, 2012-10-15) and 0fcec2ce54
(format-patch: make rfc2047 encoding more strict, 2012-10-18).

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/helper/test-ctype.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
index 8ac76e93e4..b21bd672d9 100644
--- a/t/helper/test-ctype.c
+++ b/t/helper/test-ctype.c
@@ -33,6 +33,7 @@ static int is_in(const char *s, int ch)
 #define DIGIT "0123456789"
 #define LOWER "abcdefghijklmnopqrstuvwxyz"
 #define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
 #define ASCII \
 	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
 	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
@@ -42,6 +43,10 @@ static int is_in(const char *s, int ch)
 	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
 	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
 	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+#define CNTRL \
+	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+	"\x7f"

 int cmd__ctype(int argc, const char **argv)
 {
@@ -55,6 +60,10 @@ int cmd__ctype(int argc, const char **argv)
 	TEST_CLASS(isascii, ASCII);
 	TEST_CLASS(islower, LOWER);
 	TEST_CLASS(isupper, UPPER);
+	TEST_CLASS(iscntrl, CNTRL);
+	TEST_CLASS(ispunct, PUNCT);
+	TEST_CLASS(isxdigit, DIGIT "abcdefABCDEF");
+	TEST_CLASS(isprint, LOWER UPPER DIGIT PUNCT " ");

 	return rc;
 }
--
2.39.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/3] test-ctype: test all classifiers
  2023-02-13 21:08 ` [PATCH v2 0/3] test-ctype: test all classifiers René Scharfe
                     ` (2 preceding siblings ...)
  2023-02-13 21:12   ` [PATCH v2 3/3] test-ctype: test iscntrl, ispunct, isxdigit and isprint René Scharfe
@ 2023-02-13 21:59   ` Junio C Hamano
  3 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2023-02-13 21:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Masahiro Yamada

René Scharfe <l.s.r@web.de> writes:

> Add tests to cover the remaining character class test functions.
>
> Changes since v1:
> - More compact and readable isprint() test string specification by
>   reusing the one for ispunct().

Very nice.  Thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-02-13 21:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-11 13:04 [PATCH 0/3] test-ctype: test all classifiers René Scharfe
2023-02-11 13:12 ` [PATCH 1/3] test-ctype: test isascii René Scharfe
2023-02-11 19:48   ` Junio C Hamano
2023-02-12  9:48     ` René Scharfe
2023-02-13  3:39       ` Junio C Hamano
2023-02-13 18:37         ` René Scharfe
2023-02-13 19:02           ` Junio C Hamano
2023-02-11 13:12 ` [PATCH 2/3] test-ctype: test islower and isupper René Scharfe
2023-02-11 13:12 ` [PATCH 3/3] test-ctype: test iscntrl, ispunct, isxdigit and isprint René Scharfe
2023-02-13 21:08 ` [PATCH v2 0/3] test-ctype: test all classifiers René Scharfe
2023-02-13 21:09   ` [PATCH v2 1/3] test-ctype: test isascii René Scharfe
2023-02-13 21:10   ` [PATCH v2 2/3] test-ctype: test islower and isupper René Scharfe
2023-02-13 21:12   ` [PATCH v2 3/3] test-ctype: test iscntrl, ispunct, isxdigit and isprint René Scharfe
2023-02-13 21:59   ` [PATCH v2 0/3] test-ctype: test all classifiers Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).