Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Michael Ash's Regex Blog

Regex Musings

Unicode problem fixed

A little while  back I blogged about the Unicode character class \p{name}.  I made a point that the alpha properties didn't work property. Well it seems that is no longer true.  Before I had tested with the regulator and regexlib and the expression failed with both. After getting the latest copy of the regulator I tried again and this time they worked.  I tested against another .Net site and the expression failed like before.  Apparently the problem was fixed in some .Net update. So the latest updates are required for this to work.

\p{L} matches letters.  Not just English a-zA-Z but all letters in different languages.  A second parameter can be added to specify case.

\p{Ll} matches lower case letters such as a, b ,c ,á ,d,p, π

\p{Lu} match upper case letters. ex. A, B, Þ, Σ, G

 

I've posted examples using these on the regexlib http://www.regexlib.com/REDetails.aspx?regexp_id=706

http://www.regexlib.com/REDetails.aspx?regexp_id=707

in the sample I've wrap the character class inside word boundaries and made it case sensitive,  just to make it clearer how these work

Published Thursday, May 06, 2004 12:31 AM by mash
Filed under:

Comments

No Comments
Anonymous comments are disabled