From: Duy Nguyen <pclouds@gmail.com>
To: Anthony Ramine <n.oxyde@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v3] wildmatch: properly fold case everywhere
Date: Wed, 29 May 2013 20:52:07 +0700 [thread overview]
Message-ID: <CACsJy8A61nYu9a-BhUiBhBEv-e6_CtYyZE3sG9iCiau+3EKVdw@mail.gmail.com> (raw)
In-Reply-To: <4E816EBA-A22D-4507-BED0-0DE55D2E619C@gmail.com>
On Wed, May 29, 2013 at 8:37 PM, Anthony Ramine <n.oxyde@gmail.com> wrote:
> Le 29 mai 2013 à 15:22, Duy Nguyen a écrit :
>
>> On Tue, May 28, 2013 at 8:58 PM, Anthony Ramine <n.oxyde@gmail.com> wrote:
>>> Case folding is not done correctly when matching against the [:upper:]
>>> character class and uppercased character ranges (e.g. A-Z).
>>> Specifically, an uppercase letter fails to match against any of them
>>> when case folding is requested because plain characters in the pattern
>>> and the whole string and preemptively lowercased to handle the base case
>>> fast.
>>
>> I did a little test with glibc fnmatch and also checked the source
>> code. I don't think 'a' matches [:upper:]. So I'm not sure if that's a
>> correct behavior or a bug in glibc. The spec is not clear (I think) on
>> this. I guess we should just assume that 'a' should match '[:upper:]'?
>
> I don't know, in my opinion if case folding is enabled we should say [:upper:], [:lower:] and [:alpha:] are equivalent.
>
> This opinion is shared by GNU Flex [1]:
>
>> • If your scanner is case-insensitive (the ‘-i’ flag), then ‘[:upper:]’ and ‘[:lower:]’ are equivalent to ‘[:alpha:]’.
>
> [1] http://flex.sourceforge.net/manual/Patterns.html
Then we should do it too because of this precedent, I think.
>>> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>>> }
>>> if (t_ch <= p_ch && t_ch >= prev_ch)
>>> matched = 1;
>>> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) {
>>> + uchar t_ch_upper = toupper(t_ch);
>>> + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
>>> + matched = 1;
>>> + }
>>
>> Or we could stick with to tolower. Something like this
>>
>> if ((t_ch <= p_ch && t_ch >= prev_ch) ||
>> ((flags & WM_CASEFOLD) &&
>> t_ch <= tolower(p_ch) && t_ch >= tolower(prev_ch)))
>> match = 1;
>>
>> I think it's easier to read if we either downcase all, or upcase all, not both.
>
> If the range to match against is [A-_], it will become [a-_] which is an empty range, ord('a') > ord('_'). I think it is simpler to reuse toupper() after the fact as I did.
>
> Anyway maybe I should add a test for that corner case?
Yeah I was thinking about such a case, but I saw glibc do it... I
guess we just found another bug, at least in compat/fnmatch.c. Yes a
test for it would be great, in case I change my mind 2 years from now
and decide to turn it the other way ;)
>
>>> p_ch = 0; /* This makes "prev_ch" get set to 0. */
>>> } else if (p_ch == '[' && p[1] == ':') {
>>> const uchar *s;
>>> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>>> } else if (CC_EQ(s,i, "upper")) {
>>> if (ISUPPER(t_ch))
>>> matched = 1;
>>> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch))
>>> + matched = 1;
>>> } else if (CC_EQ(s,i, "xdigit")) {
>>> if (ISXDIGIT(t_ch))
>>> matched = 1;
>>
>> If WM_CASEFOLD is set, maybe isalpha(t_ch) is enough then?
>
> Yes isalpha() is enought but I wanted to keep the two cases separated, I can amend that if you want.
Either way is fine. I don't think this code is performance critical. Your call.
--
Duy
next prev parent reply other threads:[~2013-05-29 13:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-28 12:32 [PATCH] wildmatch: properly fold case everywhere Anthony Ramine
2013-05-28 12:53 ` Duy Nguyen
2013-05-28 13:01 ` Anthony Ramine
2013-05-28 13:10 ` [PATCH v2] " Anthony Ramine
2013-05-28 13:58 ` [PATCH v3] " Anthony Ramine
2013-05-29 13:22 ` Duy Nguyen
2013-05-29 13:37 ` Anthony Ramine
2013-05-29 13:52 ` Duy Nguyen [this message]
2013-05-29 17:57 ` Anthony Ramine
2013-05-30 0:04 ` Duy Nguyen
2013-05-30 8:45 ` [PATCH] " Anthony Ramine
2013-05-30 8:52 ` Duy Nguyen
2013-05-30 9:07 ` Eric Sunshine
2013-05-30 9:29 ` Anthony Ramine
2013-05-30 10:09 ` Eric Sunshine
2013-05-30 10:19 ` [PATCH v5] " Anthony Ramine
2013-06-02 21:53 ` Junio C Hamano
2013-06-02 23:42 ` Anthony Ramine
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACsJy8A61nYu9a-BhUiBhBEv-e6_CtYyZE3sG9iCiau+3EKVdw@mail.gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=n.oxyde@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).