Be careful when dealing with Unicode.
Unicode is the latest attempt to account for different characters of different languages. To quote the Unicode website
“Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.”
Most of us who are marginally familiar with Unicode are really only familiar with plane 0 of Unicode. Unicode is divided (so far) into 17 planes, each plane being 65536 characters. Plane 0 is the one that contains the basic characters most of us are familiar with. It’s also the only plane that current Regex engine recognizes. Plane 0 has the most character defined, not all 65536 have been assigned. Most of the other planes are empty but a few (1,2,14) have some characters assigned.
The Microsoft regex engine has a bug when dealing characters in any other plane. When trying to match non-ASCII character with the regex [^\u00FF] it will return two matches for every character not in plane 0. I think it has something to do with the fact that the Unicode (Hex) value for those characters is greater than FFFF, which is the maximum value you can use with \uXXXX syntax. The Unicode website does discuss writing a regex engine to deal with Unicode so the fix may be some still in development. So just be aware of it.