git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Anthony Ramine <n.oxyde@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH v5] wildmatch: properly fold case everywhere
Date: Mon, 3 Jun 2013 01:42:51 +0200	[thread overview]
Message-ID: <586F64C2-0F44-4DAB-B91A-DB88C624FEC2@gmail.com> (raw)
In-Reply-To: <7vppw4mb77.fsf@alter.siamese.dyndns.org>

Hello Junio,

Replied inline.

Regards,

-- 
Anthony Ramine

Le 2 juin 2013 à 23:53, Junio C Hamano a écrit :

> Anthony Ramine <n.oxyde@gmail.com> writes:
> 
>> ase folding is not done correctly when matching against the [:upper:]
>> character class and uppercased character ranges (e.g. A-Z).
>> Specifically, an uppercase letter fails to match against any of them
>> when case folding is requested because plain characters in the pattern
>> and the whole string are preemptively lowercased to handle the base case
>> fast.
>> 
>> That optimization is kept and ISLOWER() is used in the [:upper:] case
>> when case folding is requested, while matching against a character range
>> is retried with toupper() if the character was lowercase, as the bounds
>> of the range itself cannot be modified (in a case-insensitive context,
>> [A-_] is not equivalent to [a-_]).
>> 
>> Signed-off-by: Anthony Ramine <n.oxyde@gmail.com>
>> Reviewed-by: Duy Nguyen <pclouds@gmail.com>
>> ---
> 
> Thanks.
> 
>> t/t3070-wildmatch.sh | 55 ++++++++++++++++++++++++++++++++++++++++++++++------
>> wildmatch.c          |  7 +++++++
>> 2 files changed, 56 insertions(+), 6 deletions(-)
>> 
>> I added Duy as reviewer and fixed a typo in the commit message reported by
>> Eric Sunshine.
>> 
>> diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
>> index 4c37057..38446a0 100755
>> --- a/t/t3070-wildmatch.sh
>> +++ b/t/t3070-wildmatch.sh
>> @@ -6,20 +6,20 @@ test_description='wildmatch tests'
>> 
>> match() {
>>     if [ $1 = 1 ]; then
>> -	test_expect_success "wildmatch:    match '$3' '$4'" "
>> +	test_expect_success "wildmatch:     match '$3' '$4'" "
>> 	    test-wildmatch wildmatch '$3' '$4'
>> 	"
>>     else
>> -	test_expect_success "wildmatch: no match '$3' '$4'" "
>> +	test_expect_success "wildmatch:  no match '$3' '$4'" "
>> 	    ! test-wildmatch wildmatch '$3' '$4'
>> 	"
>>     fi
>>     if [ $2 = 1 ]; then
>> -	test_expect_success "fnmatch:      match '$3' '$4'" "
>> +	test_expect_success "fnmatch:       match '$3' '$4'" "
>> 	    test-wildmatch fnmatch '$3' '$4'
>> 	"
>>     elif [ $2 = 0 ]; then
>> -	test_expect_success "fnmatch:   no match '$3' '$4'" "
>> +	test_expect_success "fnmatch:    no match '$3' '$4'" "
>> 	    ! test-wildmatch fnmatch '$3' '$4'
>> 	"
>> #    else
> 
> Is the above about aligning $3/$4 across checks of different types
> (i.e. purely cosmetic)?  I am not complaining; just making sure if
> there is nothing deeper going on.

Yes purely cosmetic.

> It is outside the scope of this change, but the shell style of this
> script (most notably use of [] instead of test) needs to be fixed
> someday, preferrably soon, including the commented out else clause
> at the end of the hunk.
> 
>> @@ -235,4 +247,35 @@ pathmatch 1 abcXdefXghi '*X*i'
>> pathmatch 1 ab/cXd/efXg/hi '*/*X*/*/*i'
>> pathmatch 1 ab/cXd/efXg/hi '*Xg*i'
>> 
>> +# Case-sensitivy features
>> +match 0 x 'a' '[A-Z]'
>> +match 1 x 'A' '[A-Z]'
>> +match 0 x 'A' '[a-z]'
>> +match 1 x 'a' '[a-z]'
>> +match 0 x 'a' '[[:upper:]]'
>> +match 1 x 'A' '[[:upper:]]'
>> +match 0 x 'A' '[[:lower:]]'
>> +match 1 x 'a' '[[:lower:]]'
>> +match 0 x 'A' '[B-Za]'
>> +match 1 x 'a' '[B-Za]'
>> +match 0 x 'A' '[B-a]'
>> +match 1 x 'a' '[B-a]'
>> +match 0 x 'z' '[Z-y]'
>> +match 1 x 'Z' '[Z-y]'
>> +
>> +imatch 1 'a' '[A-Z]'
> 
> Do we want "# Case-insensitivity features" commment here as well?

I don't particularly care, some sections have titles, some don't.

>> +imatch 1 'A' '[A-Z]'
>> +imatch 1 'A' '[a-z]'
>> +imatch 1 'a' '[a-z]'
>> +imatch 1 'a' '[[:upper:]]'
>> +imatch 1 'A' '[[:upper:]]'
>> +imatch 1 'A' '[[:lower:]]'
>> +imatch 1 'a' '[[:lower:]]'
>> +imatch 1 'A' '[B-Za]'
>> +imatch 1 'a' '[B-Za]'
>> +imatch 1 'A' '[B-a]'
>> +imatch 1 'a' '[B-a]'
>> +imatch 1 'z' '[Z-y]'
>> +imatch 1 'Z' '[Z-y]'
> 
>> +
>> test_done
>> diff --git a/wildmatch.c b/wildmatch.c
>> index 7192bdc..f91ba99 100644
>> --- a/wildmatch.c
>> +++ b/wildmatch.c
>> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>> 					}
>> 					if (t_ch <= p_ch && t_ch >= prev_ch)
>> 						matched = 1;
>> +					else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) {
>> +						uchar t_ch_upper = toupper(t_ch);
>> +						if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
>> +							matched = 1;
>> +					}
>> 					p_ch = 0; /* This makes "prev_ch" get set to 0. */
> 
> Hmm, this looks somewhat strange.
> 
> * At the beginning of the outermost "per characters in the text"
>   loop, we seem to downcase t_ch when WM_CASEFOLD is set.
> * Also at the same place, we also seem to downcase p_ch under the
>   same condition.
> 
> which makes me wonder why the fix is not like this:
> 
> 	+	if (flags & WM_CASEFOLD) {
>        +		if (ISUPPER(p_ch))
>        +			p_ch = tolower(p_ch);
>        +		if (prev_ch && ISUPPER(prev_ch))
>        +			prev_ch = tolower(prev_ch);
> 	+	}
> 		if (t_ch <= p_ch && t_ch >= prev_ch)
> 			matched = 1;
> 		p_ch = 0; /* This sets "prev_ch" to 0 */
> 
> 
> Ahh, OK, the "seemingly strange" construct is about handling a range
> like "[Z-y]"; we do not want to upcase or downcase the p_ch/prev_ch
> make the range "[z-y]" (empty) which would exclude bytes like "^",
> "_" or even "Z".
> 
> And it is also OK to downcase p_ch in a single-letter case, not the
> characters in a range, at the beginning of the outermost loop,
> because we always compare for equality against t_ch (which is
> downcased) in that case.

Yeah I tried to explain that in the commit message but it is surely too dense.

>> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>> 					} else if (CC_EQ(s,i, "upper")) {
>> 						if (ISUPPER(t_ch))
>> 							matched = 1;
>> +						else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch))
>> +							matched = 1;
> 
> This also looks somewhat strange but correct in that t_ch is already
> downcased so we do not need a corresponding change for CC_EQ("lower")
> codepath.

I chose to still lowercase everything first, to keep the common path as is.

> Interesting.  Will apply.
> 
> Thanks.

Great. You're welcome.

      reply	other threads:[~2013-06-02 23:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-28 12:32 [PATCH] wildmatch: properly fold case everywhere Anthony Ramine
2013-05-28 12:53 ` Duy Nguyen
2013-05-28 13:01   ` Anthony Ramine
2013-05-28 13:10 ` [PATCH v2] " Anthony Ramine
2013-05-28 13:58 ` [PATCH v3] " Anthony Ramine
2013-05-29 13:22   ` Duy Nguyen
2013-05-29 13:37     ` Anthony Ramine
2013-05-29 13:52       ` Duy Nguyen
2013-05-29 17:57         ` Anthony Ramine
2013-05-30  0:04           ` Duy Nguyen
2013-05-30  8:45             ` [PATCH] " Anthony Ramine
2013-05-30  8:52               ` Duy Nguyen
2013-05-30  9:07               ` Eric Sunshine
2013-05-30  9:29                 ` Anthony Ramine
2013-05-30 10:09                   ` Eric Sunshine
2013-05-30 10:19               ` [PATCH v5] " Anthony Ramine
2013-06-02 21:53                 ` Junio C Hamano
2013-06-02 23:42                   ` Anthony Ramine [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=586F64C2-0F44-4DAB-B91A-DB88C624FEC2@gmail.com \
    --to=n.oxyde@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).