Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Michael Ash's Regex Blog

Regex Musings

Switched at birth

Seem there is another bug in the .Net regex engine concerning Unicode

The characters matched by the Unicode categories for quotes are backwards.

 

Pi – InitialQuotePunctuation Indicates that the character is an opening or initial quotation mark. Signified by the Unicode designation "Pi" ( punctuation, initial quote ).

Pf – FinalQuotePunctuation Indicates that the character is a closing or final quotation mark. Signified by the Unicode designation "Pf" ( punctuation, final quote )

 

However \p{Pi} will match a closing quote and \p{Pf} will match an opening quote. This is backwards. 

 

I should note the quote characters in question are the Unicode left and right single and double quotation marks not the ASCII double quotes (") or apostrophe (')

Sponsor
Published Monday, September 13, 2004 11:28 AM by mash
Filed under:

Comments

No Comments
Anonymous comments are disabled