ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:20226] \b and \B with Unicode
@ 2008-12-02 23:46 Dave Thomas
  2008-12-03  0:06 ` [ruby-core:20227] " Radosław Bułat
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Thomas @ 2008-12-02 23:46 UTC (permalink / raw
  To: ruby-core

#encoding: utf-8
p "  ∂og ".match(/\b./u)    # =>  #<MatchData "o">


I was surprised that \b didn't find the boundary between the space and  
the Unicode ∂ character. Is that correct?


Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:20227] Re: \b and \B with Unicode
  2008-12-02 23:46 [ruby-core:20226] \b and \B with Unicode Dave Thomas
@ 2008-12-03  0:06 ` Radosław Bułat
  2008-12-03  0:15   ` [ruby-core:20228] " Yukihiro Matsumoto
  2008-12-03  0:30   ` [ruby-core:20229] " Michael Selig
  0 siblings, 2 replies; 5+ messages in thread
From: Radosław Bułat @ 2008-12-03  0:06 UTC (permalink / raw
  To: ruby-core

On Wed, Dec 3, 2008 at 12:46 AM, Dave Thomas <dave@pragprog.com> wrote:
> #encoding: utf-8
> p "  ∂og ".match(/\b./u)    # =>  #<MatchData "o">
>
>
> I was surprised that \b didn't find the boundary between the space and the
> Unicode ∂ character. Is that correct?

Maybe ∂ isn't treat as word character?
"  ∂og ".match(/\w/) => #<MatchData "o">
I don't know if it should or not.

-- 
Pozdrawiam

Radosław Bułat
http://radarek.jogger.pl - mój blog

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:20228] Re: \b and \B with Unicode
  2008-12-03  0:06 ` [ruby-core:20227] " Radosław Bułat
@ 2008-12-03  0:15   ` Yukihiro Matsumoto
  2008-12-03  0:30   ` [ruby-core:20229] " Michael Selig
  1 sibling, 0 replies; 5+ messages in thread
From: Yukihiro Matsumoto @ 2008-12-03  0:15 UTC (permalink / raw
  To: ruby-core

Hi,

In message "Re: [ruby-core:20227] Re: \b and \B with Unicode"
    on Wed, 3 Dec 2008 09:06:55 +0900, "=?ISO-8859-2?Q?Rados=B3aw_Bu=B3at?=" <radek.bulat@gmail.com> writes:
|
|On Wed, Dec 3, 2008 at 12:46 AM, Dave Thomas <dave@pragprog.com> wrote:
|> #encoding: utf-8
|> p "  ∂og ".match(/\b./u)    # =>  #<MatchData "o">
|>
|> I was surprised that \b didn't find the boundary between the space and the
|> Unicode ∂ character. Is that correct?
|
|Maybe ∂ isn't treat as word character?
|"  ∂og ".match(/\w/) => #<MatchData "o">
|I don't know if it should or not.

I think it should.  I suspect it's a bug in Oniguruma.  I will inspect
it later.

							matz.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:20229] Re: \b and \B with Unicode
  2008-12-03  0:06 ` [ruby-core:20227] " Radosław Bułat
  2008-12-03  0:15   ` [ruby-core:20228] " Yukihiro Matsumoto
@ 2008-12-03  0:30   ` Michael Selig
  2008-12-03  3:54     ` [ruby-core:20234] " Dave Thomas
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Selig @ 2008-12-03  0:30 UTC (permalink / raw
  To: ruby-core

On Wed, 03 Dec 2008 11:06:55 +1100, Radosław Bułat <radek.bulat@gmail.com>  
wrote:

> On Wed, Dec 3, 2008 at 12:46 AM, Dave Thomas <dave@pragprog.com> wrote:
>> #encoding: utf-8
>> p "  ∂og ".match(/\b./u)    # =>  #<MatchData "o">
>>
>>
>> I was surprised that \b didn't find the boundary between the space and  
>> the
>> Unicode ∂ character. Is that correct?
>
> Maybe ∂ isn't treat as word character?
> "  ∂og ".match(/\w/) => #<MatchData "o">
> I don't know if it should or not.
>

The character in question is Unicode U+2202 which is "Partial  
Differential". Though it looks like it might be a letter, it is NOT  
defined as a letter in Unicode (it is part of the "mathematical operators"  
block). So I think Ruby is correct here!

Cheers
Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:20234] Re: \b and \B with Unicode
  2008-12-03  0:30   ` [ruby-core:20229] " Michael Selig
@ 2008-12-03  3:54     ` Dave Thomas
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Thomas @ 2008-12-03  3:54 UTC (permalink / raw
  To: ruby-core


On Dec 2, 2008, at 6:30 PM, Michael Selig wrote:

> The character in question is Unicode U+2202 which is "Partial  
> Differential". Though it looks like it might be a letter, it is NOT  
> defined as a letter in Unicode (it is part of the "mathematical  
> operators" block). So I think Ruby is correct here!

Ouch: I just used Option-D on the Mac to generate it.

I'll try again.


Thanks


Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-12-03  4:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-02 23:46 [ruby-core:20226] \b and \B with Unicode Dave Thomas
2008-12-03  0:06 ` [ruby-core:20227] " Radosław Bułat
2008-12-03  0:15   ` [ruby-core:20228] " Yukihiro Matsumoto
2008-12-03  0:30   ` [ruby-core:20229] " Michael Selig
2008-12-03  3:54     ` [ruby-core:20234] " Dave Thomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).