git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	schnhrr@cs.tu-berlin.de, rene.scharfe@lsrfire.ath.cx,
	"Johannes Sixt" <j6t@kdbg.org>,
	torvalds@linux-foundation.org,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH] wildmatch: correct isprint and isspace
Date: Thu, 15 Nov 2012 19:19:43 +0700	[thread overview]
Message-ID: <1352981983-22005-1-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1352803572-14547-1-git-send-email-pclouds@gmail.com>

Current isprint() incorrectly includes control characters 9, 10 and
13, which is fixed by this patch.

Current isspace() lacks 11 and 12. But Git's isspace() has been
designed this way since the beginning and has over 100 call sites
relying on this. Instead of updating isspace() behavior (which could be
tricky as patches from other topics may come in parallel that assume
the old isspace()), a new isspace_posix() is introduced and used by
wildmatch.c. Other part of Git can be converted to use this new
function if it seems appropriate.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Sorry for the late response. I'll reply to everybody in one mail.

 On Wed, Nov 14, 2012 at 1:58 AM, "Jan H. Schönherr" <schnhrr@cs.tu-berlin.de> wrote:
 > An alternative to switching from 1-byte to 4-byte values (don't we have
 > a 2-byte datatype?), would be to free up GIT_CNTRL and simply do:
 >
 > #define iscntrl(x) ((x) < 0x20)
 
 No. 127 is also a control character.

 On Wed, Nov 14, 2012 at 2:41 AM, Johannes Sixt <j6t@kdbg.org> wrote:
 > So we have two properties that overlap:
 >
 >       SSSSSSSSSS
 >    CCCCCCCC
 >
 > You seem to generate partions:
 >
 >    XXXYYYYYZZZZZ
 >
 > then assign individual bits to each partition. Now each entry in the
 > lookup table has only one bit set. Then you define isxxx() to check for
 > one of the two possible bits:
 >
 >    iscntrl is X or Y
 >    isspace is Y or Z
 >
 > But shouldn't you just assign one bit for S and another one for C, have
 > entries in the lookup table with more than one bit set, and check for
 > only one bit in the isxxx macro?
 >
 > That way you don't run out of bits as easily as you do with this patch.

 I need three sets of characters actually: control, spaces and
 printable (which contains non-control spaces). Making it
 (isspace(x) && (x) >= 32) is simpler and because isprint() is only used in
 wildmatch, I don't need to think about performance penalty (yet).

 On Thu, Nov 15, 2012 at 2:30 AM, René Scharfe <rene.scharfe@lsrfire.ath.cx> wrote:
 > Nevertheless, it's unfortunate that we have an isspace() that *almost* does
 > what the widely known thing of the same name does.  I'd shy away from
 > changing git's version directly, because it's used more than a hundred times
 > in the code, and estimating the impact of adding \v and \f to it.
 > Perhaps renaming it to isgitspace() is a good first step, followed by
 > adding a "standard" version of isspace() for wildmatch?

 There are just too many call sites of isspace() and there is a risk
 of new call sites coming in independently. So I think keeping isspace()
 as-is and using a different name for the standard version is probably
 a better choice.

 As the new isspace_posix() is only used by wildmatch, its performance
 as of now is not critical and a simple macro like in this patch is
 probably enough. We can optimize it later if we need to.

 git-compat-util.h | 4 +++-
 wildmatch.c       | 8 ++------
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 02f48f6..d4c3fda 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -486,6 +486,7 @@ extern const unsigned char sane_ctype[256];
 #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
 #define isascii(x) (((x) & ~0x7f) == 0)
 #define isspace(x) sane_istest(x,GIT_SPACE)
+#define isspace_posix(x) (((x) >= 9 && (x) <= 13) || (x) == 32)
 #define isdigit(x) sane_istest(x,GIT_DIGIT)
 #define isalpha(x) sane_istest(x,GIT_ALPHA)
 #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
@@ -499,7 +500,8 @@ extern const unsigned char sane_ctype[256];
 #define isxdigit(x) (hexval_table[x] != -1)
 #define isprint(x) (sane_istest(x, GIT_ALPHA | GIT_DIGIT | GIT_SPACE | \
 		GIT_PUNCT | GIT_REGEX_SPECIAL | GIT_GLOB_SPECIAL | \
-		GIT_PATHSPEC_MAGIC))
+		GIT_PATHSPEC_MAGIC) && \
+		(x) >= 32)
 #define tolower(x) sane_case((unsigned char)(x), 0x20)
 #define toupper(x) sane_case((unsigned char)(x), 0)
 #define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC)
diff --git a/wildmatch.c b/wildmatch.c
index 3972e26..fd74efd 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -37,11 +37,7 @@ typedef unsigned char uchar;
 # define ISBLANK(c) ((c) == ' ' || (c) == '\t')
 #endif
 
-#ifdef isgraph
-# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
-#else
-# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
-#endif
+#define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace_posix(c))
 
 #define ISPRINT(c) (ISASCII(c) && isprint(c))
 #define ISDIGIT(c) (ISASCII(c) && isdigit(c))
@@ -50,7 +46,7 @@ typedef unsigned char uchar;
 #define ISCNTRL(c) (ISASCII(c) && iscntrl(c))
 #define ISLOWER(c) (ISASCII(c) && islower(c))
 #define ISPUNCT(c) (ISASCII(c) && ispunct(c))
-#define ISSPACE(c) (ISASCII(c) && isspace(c))
+#define ISSPACE(c) (ISASCII(c) && isspace_posix(c))
 #define ISUPPER(c) (ISASCII(c) && isupper(c))
 #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
 
-- 
1.8.0.rc2.23.g1fb49df

  parent reply	other threads:[~2012-11-15 12:19 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-14  2:34 [PATCH v5 00/12] nd/wildmatch Nguyễn Thái Ngọc Duy
2012-10-14  2:34 ` [PATCH v5 01/12] ctype: make sane_ctype[] const array Nguyễn Thái Ngọc Duy
2012-10-14  2:35 ` [PATCH v5 02/12] ctype: support iscntrl, ispunct, isxdigit and isprint Nguyễn Thái Ngọc Duy
2012-10-14  5:02   ` Junio C Hamano
2012-10-14  5:07     ` Nguyen Thai Ngoc Duy
2012-10-14 12:59   ` René Scharfe
2012-10-14 13:25     ` Nguyen Thai Ngoc Duy
2012-10-14 13:59       ` René Scharfe
2012-10-14 14:26         ` Nguyen Thai Ngoc Duy
2012-10-17 12:09           ` "Jan H. Schönherr"
2012-10-17 12:26             ` Nguyen Thai Ngoc Duy
2012-11-13 10:46             ` [PATCH nd/wildmatch] Correct Git's version of isprint and isspace Nguyễn Thái Ngọc Duy
2012-11-13 18:58               ` "Jan H. Schönherr"
2012-11-13 19:14               ` René Scharfe
2012-11-13 19:15               ` René Scharfe
2012-11-13 19:40                 ` Linus Torvalds
2012-11-13 19:50                   ` Linus Torvalds
2012-11-14 19:30                     ` René Scharfe
2012-11-13 19:41               ` Johannes Sixt
2012-11-15 12:19               ` Nguyễn Thái Ngọc Duy [this message]
2012-11-15 17:13                 ` [PATCH] wildmatch: correct " "Jan H. Schönherr"
2012-11-16  4:19                   ` Nguyen Thai Ngoc Duy
2012-10-14  2:35 ` [PATCH v5 03/12] Import wildmatch from rsync Nguyễn Thái Ngọc Duy
2012-10-14  2:35 ` [PATCH v5 04/12] wildmatch: remove unnecessary functions Nguyễn Thái Ngọc Duy
2012-10-14  5:04   ` Junio C Hamano
2012-10-14  6:29     ` Nguyen Thai Ngoc Duy
2012-10-14  2:35 ` [PATCH v5 05/12] Integrate wildmatch to git Nguyễn Thái Ngọc Duy
2012-10-14  5:06   ` Junio C Hamano
2012-10-14 11:07   ` Torsten Bögershausen
2012-10-14  2:35 ` [PATCH v5 06/12] t3070: disable unreliable fnmatch tests Nguyễn Thái Ngọc Duy
2012-10-14  2:35 ` [PATCH v5 07/12] wildmatch: make wildmatch's return value compatible with fnmatch Nguyễn Thái Ngọc Duy
2012-10-14  5:09   ` Junio C Hamano
2012-10-14  2:35 ` [PATCH v5 08/12] wildmatch: remove static variable force_lower_case Nguyễn Thái Ngọc Duy
2012-10-14  2:35 ` [PATCH v5 09/12] wildmatch: fix case-insensitive matching Nguyễn Thái Ngọc Duy
2012-10-14  2:35 ` [PATCH v5 10/12] wildmatch: adjust "**" behavior Nguyễn Thái Ngọc Duy
2012-10-14  2:35 ` [PATCH v5 11/12] wildmatch: make /**/ match zero or more directories Nguyễn Thái Ngọc Duy
2012-10-14  2:35 ` [PATCH v5 12/12] Support "**" wildcard in .gitignore and .gitattributes Nguyễn Thái Ngọc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1352981983-22005-1-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=rene.scharfe@lsrfire.ath.cx \
    --cc=schnhrr@cs.tu-berlin.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).