<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://regexadvice.com/utility/FeedStylesheets/atom.xsl" media="screen"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><title type="html">Michael Ash's Regex Blog</title><subtitle type="html">Regex Musings</subtitle><id>http://regexadvice.com/blogs/mash/atom.aspx</id><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/default.aspx" /><link rel="self" type="application/atom+xml" href="http://regexadvice.com/blogs/mash/atom.aspx" /><generator uri="http://communityserver.org" version="2.1.60809.935">Community Server</generator><updated>2004-09-13T11:28:00Z</updated><entry><title>Update to CSS Minification</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2008/04/27/Update-to-CSS-Minification.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2008/04/27/Update-to-CSS-Minification.aspx</id><published>2008-04-27T22:06:00Z</published><updated>2008-04-27T22:06:00Z</updated><content type="html">This is a C# 2.0 enhancement of a C# port of YUI Compressor &amp;#39;s CSS minification code I got a little carried away with ideas for this, they were all regex based which really is what motivated me to work on it. However after I thought I was done I learned not everything worked. It did what I wanted it to do but what I wanted wasn&amp;#39;t the correct thing. I really should have just stopped with my original ideas. The last idea for my original changes was to take 2 or more individual subset properties...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2008/04/27/Update-to-CSS-Minification.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=41665" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Follow up to Additional CSS minifying regex patterns</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2008/04/18/Follow-up-to-Additional-CSS-minifying-regex-patterns.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2008/04/18/Follow-up-to-Additional-CSS-minifying-regex-patterns.aspx</id><published>2008-04-18T20:04:00Z</published><updated>2008-04-18T20:04:00Z</updated><content type="html">OK, there regexes were discussed in the previous post this is mostly just their application. This is a C# 2.0 enhancement of a C# port of YUI Compressor &amp;#39;s CSS minification code Since I was doing this is C# I took full advantage of it&amp;#39;s regex engine, namely using lookbehinds and delegates for some replaces. Almost all the regexes after the &amp;quot;New Test&amp;quot; comment are the new or modified regexes from the ported version. There is also one new and two modified expressions before that comment....(&lt;a href="http://regexadvice.com/blogs/mash/archive/2008/04/18/Follow-up-to-Additional-CSS-minifying-regex-patterns.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=41530" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Additional CSS minifying regex patterns</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2008/03/27/Additional-CSS-minifying-regex-patterns.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2008/03/27/Additional-CSS-minifying-regex-patterns.aspx</id><published>2008-03-27T20:55:00Z</published><updated>2008-03-27T20:55:00Z</updated><content type="html">NOTE: All the regex referenced on this page written by me are using IgnoreCase = true I was looking at the regexes used in the YUI Compressor to minify CSS and came up with a couple of more that I think could help the process. The code and port I was looking at was already trimming unneeded zeros used for the top, right, bottom, left values with a simple string replace. But there were three separate replaces being done. It was pretty simple to come up with a regex to handle all the cases (Pseudo...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2008/03/27/Additional-CSS-minifying-regex-patterns.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=40757" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>A touch of Character Class</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2008/01/31/A-touch-of-Character-Class.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2008/01/31/A-touch-of-Character-Class.aspx</id><published>2008-01-31T16:49:00Z</published><updated>2008-01-31T16:49:00Z</updated><content type="html">The square brackets character class is one of the more misunderstood of the basic regex features. This feature is supported in virtually all regex implementations. In fact off the top of my head I don&amp;#39;t know an implementation that doesn&amp;#39;t support it. Maybe it&amp;#39;s not well documented in most tutorials or maybe the samples are not clear enough or maybe users are just skimming over the details for this one but I see this feature misused quite often. This type of character class simply matches...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2008/01/31/A-touch-of-Character-Class.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=39143" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Remember where you come from</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2007/10/01/Remember-where-you-come-from.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2007/10/01/Remember-where-you-come-from.aspx</id><published>2007-10-01T06:42:00Z</published><updated>2007-10-01T06:42:00Z</updated><content type="html">One thing I&amp;#39;ve noticed among rookie regex users is that they focus way too much on what they are trying to match and not on where they are trying to match it from. There is a tenancy to drastically under estimate the importance of their source data, which they are trying to apply their regex against. I see this all the time on forums where posters asking for help almost never post any sample data, and most of the few that do post a sample they made up on the spot. Over on they RegexAdvice construction...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2007/10/01/Remember-where-you-come-from.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=35277" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Are you ready for regex?</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx</id><published>2007-06-01T17:02:00Z</published><updated>2007-06-01T17:02:00Z</updated><content type="html">

&lt;p style="margin-bottom:0in;"&gt;Who should be using regular
expressions?&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;This has been on my mind for a while,
there are some people shouldn&amp;#39;t be using regular expressions. People
who don&amp;#39;t know what regular expressions are for.&lt;img src="http://regexadvice.com/emoticons/emotion-18.gif" alt="Huh?" /&gt;  Now you can use
regexes in a variety of ways, and you can debate either side on
whether certain applications are good uses or bad uses, but that&amp;#39;s
not what I&amp;#39;m talking about.  I&amp;#39;m not even talking about people who
don&amp;#39;t know how to write  regexes, well or at all.  I&amp;#39;m talking about
people trying to use regular expressions but have know idea what
regular expressions do.  And I don&amp;#39;t mean that you can&amp;#39;t decipher a
regex pattern by looking at it.  I mean you don&amp;#39;t understand the
general concept that regular expressions &lt;b&gt;&lt;u&gt;match&lt;/u&gt;&lt;/b&gt;&lt;span&gt;&lt;span style="text-decoration:none;"&gt;
patterns in data.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;I&amp;#39;ve seen several post over on the
regexadvice.com forums in the past few months which go something like
&amp;ldquo;I have this problem someone told me I should use a regular
expression&amp;rdquo; but unless that same someone bother to explain what a
regular expressions does or why it is suited for your task, don&amp;#39;t be
in such a hurry to plug in a regex.&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;  I&amp;#39;m not saying you need to master
regex before using them just get your head around the basics
(matching).  At the very least you should associate regex to wildcard
searches, just more powerful. If you are at least that far, proceed
with trying to implement your regex.  But if you are thinking
something completely different slow down there.  Unless you have a
basic understanding of what regular expressions do, even simple tasks
become way more difficult than they should be.  Even if you get
someone to help you write a regex and still don&amp;#39;t know what it&amp;#39;s
doing you are likely to continue to incorrectly try to use them. 
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;A common result from this lack
understanding I&amp;#39;ve been seeing is trying to perform a task solely via
writing a regex pattern, that regexes themselves have no capacity to
perform. Not that a regex couldn&amp;#39;t be a part of the solution but in
the end it&amp;#39;s not even what will do what the person wants done,
usually it a coding task, but the regex can at least do some of the
prep work of the task.  Even if some implementations of the regex
engine allow code to be called, it is still the code doing the heavy
lifting.  The regex is only finding the data.   One could save
themselves hours of wasted time if the just understand the basics of
what they are getting themselves into.&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;Another thing I&amp;#39;ve seen, related to the
not having a basic understanding, is a task that is not only
perfectly suited for using a regex but  a very common application of
a regex task and the person asking the question asking &amp;ldquo;Is this
some that a regex can do?&amp;rdquo;  Of  the issues I&amp;#39;ve mention this one is
the most and the least understandable.  The most because regular
expressions are not simple to everyone, to some it comes easy to
others it never comes completely. And some of the documentation isn&amp;#39;t
the greatest, though there is plenty of good documentation out there these days.  So I can understand why some can&amp;#39;t get their head
about  exactly how to write a regex.  But it the least understandable
when they are asking how to write a regex so common, and usually
simple, that dozens of tutorials and/or articles on regexes use the
very regex they are asking for as an example.&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;  I think the main problems of people
who fall into category is 
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;&lt;p style="margin-bottom:0in;"&gt;Lack of research.  There are more
	than enough tutorials out on the web and more than a few books that
	have simple samples to give you an idea of what a regex does. I find
	it very hard to believe when someone says they couldn&amp;#39;t find a regex
	for basic (US) zip codes. Did you even look? Or are you only using a
	regex for this task because someone told you that you should and ran out to find someone to write it for you? or are you just cheating on your homework? 
	&lt;/p&gt;
	&lt;/li&gt;&lt;li&gt;&lt;p style="margin-bottom:0in;"&gt;Misunderstanding what regular
	expressions understand.  There seems to be more than a few people
	who think a regex understands the context of the data it matches. 
	That it not only know how to match the data but it know what it
	means, either in context to their application or to the world at
	large. Those are the people that think a regex  for matching zip codes
	know what zip codes are used for and where they would be used. 
	Sorry but that&amp;#39;s not the case.  Regexes understand nothing of what
	your data means to you.  As far as the regex engine is concerned
	it&amp;#39;s just a string of characters.  It&amp;#39;s up to the regex author know
	the context of the data they want to match and to shape the regex
	accordingly to return only the relevant data.  This may make it seem
	like the regex understands the data it&amp;#39;s matching but that not what
	happening.  What happen is the person who wrote the regex understood
	the data and the problem set so well they were able to construct a
	regex the only match the relevant values.&lt;/p&gt;
	&lt;/li&gt;&lt;li&gt;&lt;p style="margin-bottom:0in;"&gt;That regex is a full blown
	programming language.  It&amp;#39;s not.  I&amp;#39;ve seen questions about wanting
	a regex to compare numbers, tell time or do some other function
	completely outside their realm but something most programming
	languages have a feature to deal with or let you write code that
	can.  I&amp;#39;ve never seen any regex documentation promoting such
	features so I can&amp;#39;t imagine why someone thinks a regex can perform
	these task.  Other than my previous point where they saw the results
	of a well written regex and speculated on what and how much work the
	regex did. Like I mentioned some implementation allow you to perform
	function calls but that is more an add-on of the programming
	environment you are using than a generic regex feature.  Realize
	that regular expressions are one of many features of your
	programming language, not the other way around. 
	&lt;/p&gt;
&lt;/li&gt;&lt;/ol&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;If you&amp;#39;ve read this far and you didn&amp;#39;t
know what regular expressions did or didn&amp;#39;t do before hopefully by
now you have some idea.  And if you already knew what they did just
stop to consider the next time you advise someone to use a regex, you
make sure you get across the high level point that you are &amp;ldquo;matching
something (a character pattern) in a string&amp;rdquo; before you get into
the more complex aspects of what a regex can do.&lt;/p&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;subject=Are+you+ready+for+regex%3f" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;title=Are+you+ready+for+regex%3f" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;title=Are+you+ready+for+regex%3f" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;title=Are+you+ready+for+regex%3f" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;title=Are+you+ready+for+regex%3f&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=30086" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>You've got your sub-matches in my matches</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx</id><published>2007-06-01T16:47:00Z</published><updated>2007-06-01T16:47:00Z</updated><content type="html">
&lt;p style="margin-bottom:0in;"&gt;Hello boys and girls.  Wow it&amp;#39;s been a
while since I&amp;#39;ve done this.  I want to touch on a very useful but
often overlooked feature of regex, grouping.  While I haven&amp;#39;t been
blogging I have been active on a message board here or there.  A
question I see quite often is &amp;ldquo;I want to find a match in a string
but I don&amp;#39;t want part of the match&amp;rdquo; or &amp;ldquo;I need the value of this
portion of the string&amp;rdquo;  Now often I see solutions to these type of
questions that involve look-arounds.  Any why they certainly work
they aren&amp;#39;t the only way to achieve the desired results. New regex
users seem to believe they can only access the full match.  Most
regex engine support groups, where in a match you can access a
certain portion of the full match.  Groups are identified by
parenthesis.  Every pair of plain parenthesis is a group.  For
example the regex pattern&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;&lt;code&gt;/&lt;span style="color:green;"&gt;(Hello)\x20(world)&lt;/span&gt;/&lt;/code&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;There are two groups in the regex.  The
regex itself matches the string &amp;ldquo;Hello world&amp;rdquo;, group 1 contains
the string &amp;ldquo;Hello&amp;rdquo;, group 2 contains the string &amp;ldquo;world&amp;rdquo;. 
Note neither group contains the space between the two words but it is
part of the full match.  Most implementations of regular expressions
allow you a ways to access these groups.  They are contained in a
collection inside the match object.  Now you&amp;#39;ll need to consult your
regex documentation to know the exact layout but in most of these
implementation there is a zero based collection where Item 0 is the
full match and Item 1..n is whichever group element your regex
contains, if any.&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;Now if you notice I said every pair of
&lt;u&gt;plain&lt;/u&gt;&lt;span style="text-decoration:none;"&gt; parenthesis is a
group.  The reason I stress plain is because there are other group
constructs, the aforementioned look-arounds being some. Now support
for the other constructs vary in implementations so again consult
your regex documentation to see which ones you have. The other
grouping constructs consist of a open parenthesis immediately
followed by a question mark, which then is followed by the characters
that define that particular grouping construct.  Again consult your
documentation to see which characters define what.  I&amp;#39;m not going to
go into all of them here but they basically fall into two categories.
 Capturing and non-Capturing.  The plain parenthesis I&amp;#39;ve mentioned
above are a capturing group.  However capturing requires extra
resources in some case you need the extra speed,  but you still to
group a certain part of the pattern together for either necessity or
readability or both. This is where you&amp;#39;ll what to using a
non-capturing group (?:pattern), a open parenthesis followed
immediately by a question mark followed immediately by a colon.  The
difference here is that the data matched in the group is not add to
the collection of submatch in the Match object.&lt;/span&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Taking our
previous example and making the first group non-capturing&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;code&gt;/&lt;span style="color:green;"&gt;(?:Hello)\x20(world)&lt;/span&gt;/&lt;/code&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Where before we
had two groups here we only have one.  Group 1 contains the
string&amp;rdquo;world&amp;rdquo;.  Now this example is not a very practical use of a
non-capturing group.  Typically you&amp;#39;d use them in more complex
regexes that have a grouping but you really don&amp;#39;t care about the
sub-matches.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;One more quick
thing about capturing groups basically each left parenthesis is the
index of the sub-match in the groups collection.  So if you have
nested parenthesis count every (plain) left parenthesis to know which
index to use to reference it. Some of the advance grouping constructs
and regex options can affect the ordering but if you are using them
hopefully you&amp;#39;ve read their effects so I won&amp;#39;t go over that here.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;code&gt;/&lt;span style="color:green;"&gt;((Hello)|(Goodbye
Cruel))\x20(world)&lt;/span&gt;/&lt;/code&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;The above regex
has 4 capturing groups (not counting group 0). Can you find them? 
Now it should match either the string &amp;ldquo;Hello world&amp;rdquo; or &amp;ldquo;Goodbye
Cruel world&amp;rdquo;  Now I want to point out that not all the groups will
participate in the match, but the are still part of the Groups
collection.  There will always be 4 groups, just one will always be
empty.  Which one depends on which string was matched.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;If &amp;ldquo;Hello
world&amp;rdquo; was matched the groups are&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Hello&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Hello&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;(empty)&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;world&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;If &amp;ldquo;Goodbye
Cruel world&amp;rdquo; was matched the groups are&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Goodbye
	Cruel&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;(empty)&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Goodbye
	Cruel&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;world&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;in both case
group 0 would be the full match&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;If you note in
both case two groups contain the same value. Even if you need to know
whether &amp;ldquo;Hello&amp;rdquo; or &amp;ldquo;Goodbye Cruel&amp;rdquo; was match, you certainly
don&amp;#39;t need to know it twice.  Plus the inner parenthesis have
different index you&amp;#39;d have to check if you want to use those.  This
is where you&amp;#39;d use the non-capturing group to simplify your groups
collection.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;code&gt;/&lt;span style="color:green;"&gt;((?:Hello)|(?:Goodbye
Cruel))\x20(world)&lt;/span&gt;/&lt;/code&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Now we are back
down to two groups.  Group 1 contains either &amp;ldquo;Hello&amp;rdquo; or &amp;ldquo;Goodbye
Cruel&amp;rdquo; depending on which string was matched.  Group 2 always
contains &amp;ldquo;world&amp;rdquo;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt; However keep in
mind in some cases you&amp;#39;ll want to use the inner index do determine
which group was matched.   So using non-capturing groups isn&amp;#39;t
necessarily a better thing it just depends on if you need to access
those groups or not. But if you are not doing anything with them
don&amp;#39;t capture them.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;These are just
two of the basic grouping constructs and they are general supported
across implementations of regex, but not always. But if they are you
can use the to easily dissect larger matches.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;subject=You%27ve+got+your+sub-matches+in+my+matches" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;title=You%27ve+got+your+sub-matches+in+my+matches" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;title=You%27ve+got+your+sub-matches+in+my+matches" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;title=You%27ve+got+your+sub-matches+in+my+matches" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;title=You%27ve+got+your+sub-matches+in+my+matches&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=30085" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Named Groups to the Rescue</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx</id><published>2005-09-28T20:06:00Z</published><updated>2005-09-28T20:06:00Z</updated><content type="html">


&lt;p class="MsoNormal"&gt;I was asked to modify some text that had been built
incorrectly. Basically insert some text at a certain point. First I use a regex
to find the text, then insert the new value within that match. &lt;span&gt;&amp;nbsp;&lt;/span&gt;Now since the inserted value goes inside the
matched text I simply wanted use backreferences and the replace method.&lt;span&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;Simple right?&lt;span&gt;&amp;nbsp;
&lt;/span&gt;Well not so much.&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;Now the text is in&amp;nbsp; a field of various rows of a database table and the
text to be inserted comes from another of the fields in the same row and is an
alphanumeric value.&lt;span&gt;&amp;nbsp; &lt;/span&gt;So the inserted text
value is dynamic, so I can’t simply hard code the replacement text. So the
replacement text is built dynamically for each row.&lt;span&gt;&amp;nbsp; &lt;/span&gt;The text to be modified is a certain
attribute somewhere in the text.&lt;span&gt;&amp;nbsp; &lt;/span&gt;For this
example lets say it’s “id=xyz”, which is constant for all records.&lt;span&gt;&amp;nbsp; &lt;/span&gt;Now the new text will be inserted right after
the equals sign.&lt;/p&gt;


&lt;p class="MsoNormal"&gt;So for&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br&gt;
 &lt;/span&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;source&lt;span&gt;&amp;nbsp; &lt;/span&gt;=“ {some stuff}
id=xyz {more stuff}”&lt;br&gt;
newText = “ab1”&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;br&gt;
you get&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;br&gt;
“{some stuff} id=&lt;font color="#ff0000"&gt;ab1&lt;/font&gt;xyz {more stuff}”&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Simple enough.&lt;span&gt;&amp;nbsp; &lt;/span&gt;You
use this regex &lt;span&gt;&amp;nbsp;&lt;/span&gt;\bid=xyz\b to match the
text. Then split it in to groups so you can use backreferences in the
replace.&lt;span&gt;&amp;nbsp; &lt;/span&gt;So your final regex looks like
this:&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;\b(bid=)(xyz)\b &lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;br&gt;
Now group 1 contain the text up to your insertion point
(id=)&lt;/p&gt;


&lt;p class="MsoNormal"&gt;And group2 contains the text after your insertion point (xyz)&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;So your replacement string for your regex is “$1&lt;font color="#008000"&gt;(new data
goes here)&lt;/font&gt;$2)” , where &lt;font color="#008000"&gt;(new data goes here)&lt;/font&gt; = some alphanumeric value pulled
from a second field in a row.&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;br&gt;
Doing this is in .Net my code looked something like this pseudo-code&lt;/p&gt;




&lt;p class="MsoNormal"&gt;Regex regexFind = new Regex(“\b(bid=)(xyz)\b”);&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Get Records&lt;/p&gt;


&lt;p class="MsoNormal"&gt;For each row&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt; fieldA
= rowFieldA&lt;span&gt;&amp;nbsp; &lt;/span&gt;(source text)&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt; fieldB
= rowFieldB (insert value)&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;fieldA =
regexFind.Replace(FieldA,String.Format(“$1{0}$2”,fieldB))&lt;/p&gt;


&lt;p class="MsoNormal"&gt;next&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;The Format method of the string create a replacement string
for each row.&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Look good?&lt;span&gt;&amp;nbsp; &lt;/span&gt;Works find
for our example but there is a problem.&lt;/p&gt;




&lt;p class="MsoNormal"&gt;For our example value of “ab1”&lt;span&gt;&amp;nbsp; &lt;/span&gt;the string format produces “$1ab1$2” which is
exactly what we want, but as this field is alphanumeric so it could begin with a
number which causes a problem.&lt;span&gt;&amp;nbsp; &lt;/span&gt;Say
for the next record the value of the text to be inserted is “12a” the format
method produces a replacement string of “$112a$2”, which is not good.&lt;span&gt;&amp;nbsp; &lt;/span&gt;Syntactically it’s fine but it’s not what we
want, because instead of trying to inserts some text between group 1 and 2,
which is what we want to do, it is trying to insert text between group 112 and
group 2.&lt;span&gt;&amp;nbsp; &lt;/span&gt;As there is no group 112 it
assumes $112 is literal text so your final result is “id=$112axyz”&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;Ok this is where named group become handy (necessary?).&lt;span&gt;&amp;nbsp; &lt;/span&gt;If you used name groups in your regex and
replacement string you can avoid this problem&lt;/p&gt;






&lt;p class="MsoNormal"&gt;Change the regex to \b(&lt;font color="#ff0000"&gt;?&amp;lt;att&amp;gt;&lt;/font&gt;bid=)(&lt;font color="#ff0000"&gt;?&amp;lt;val&amp;gt;&lt;/font&gt;xyz)\b&lt;o:p&gt; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;And your replacement string to “$&lt;font color="#ff0000"&gt;{att}&lt;/font&gt;(new data goes here)$&lt;font color="#ff0000"&gt;{val}&lt;/font&gt;)”&lt;o:p&gt; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;Now if you are using the string format method there is one
more hoop you have to jump through because the&lt;span&gt;&amp;nbsp;
&lt;/span&gt;regex engine and format method both use the curly braces there is a
conflict and the format method will complain so you have to write it like this&lt;o:p&gt; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;String.Format("${0}att{1}{2}${0}val{1}","&lt;font color="#008000"&gt;{&lt;/font&gt;","&lt;font color="#ff0000"&gt;}&lt;/font&gt;",newValue)&lt;o:p&gt; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;To get the desired replacement string.&lt;br&gt;
&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;When I started writing this I thought this was the only way
to get this to work which means you could only solve this with a regex engine, like .Net,
that supported named groups, but I’ve thought of a second way.&lt;br&gt;
&lt;/p&gt;
&lt;p class="MsoNormal"&gt;But whenever a group in your replacement string
can be followed by a digit you may want to consider using named groups
to avoid unexpected surprizes&lt;br&gt;
&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;subject=Named+Groups+to+the+Rescue" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;title=Named+Groups+to+the+Rescue" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;title=Named+Groups+to+the+Rescue" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;title=Named+Groups+to+the+Rescue" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;title=Named+Groups+to+the+Rescue&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=12925" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Making your regex code ready.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx</id><published>2005-05-18T16:44:00Z</published><updated>2005-05-18T16:44:00Z</updated><content type="html">&lt;p class="MsoNormal"&gt;There are times when regular expression you’ve written or
someone written for you needs a little tweaking before you add it to your code
and the tweaking is required because the syntax of the language conflicts with
your regex.&lt;span style=""&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;For example when part of
your regex pattern contains a double quote and the language you are using uses
double quotes as string delimiters.&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;If
you just cut and paste the pattern in your code the pattern’s quotation will
terminate your string prematurely.&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;Now
the code way to fix it is to escape the quotation in pattern. This solution
requires altering the regex and how the character is escapes depends on the
language being used.&lt;span style=""&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;The regex itself
allows you to escape character with the \ character. &lt;span style=""&gt;&amp;nbsp;&lt;/span&gt;The language being used may or may not
recognize that as escape character for its syntax.&lt;span style=""&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;And it may be confusing later when you look
at the regex and can’t remember why you escaped a character that the pattern
itself doesn’t need it, But there is another way. Hex values&lt;/p&gt;


&lt;p class="MsoNormal" style=""&gt;&lt;span style=""&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
&lt;/span&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Most regex implementations support a hex syntax \x##,&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;where # is a hex digit. &lt;/p&gt;


&lt;p class="MsoNormal"&gt;So if you use \x22 instead of double quote and \x27 for
single quotes the regexes become more cookie cutter ready.&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Another useful hex value is \x20 which is a space.&lt;span style=""&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;This is especially useful in .Net where
there is an option on a regex to ignorewhitespace in the pattern.&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;Turning this option on allows end of line comments
in the regex but with the exception of inside a character class, ignores typed
in spaces within the patterns, which would be problematic if a space was part
of the pattern to match. &lt;span style=""&gt;&amp;nbsp;&lt;/span&gt;So you could
break a working regex if you later decide to add this option. This happened on
the Regexlib when the option was first turned on.&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;A lot of patterns that were written before
the switch was flipped suddenly stopped working.&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Speaking of .Net when it comes to name groups you can’t use
the hex notation to define the group name using the single quote syntax .&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;However you can avoid any issue with single
quotes by using the alternate syntax.&lt;/p&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;subject=Making+your+regex+code+ready." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;title=Making+your+regex+code+ready." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;title=Making+your+regex+code+ready." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;title=Making+your+regex+code+ready." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;title=Making+your+regex+code+ready.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=934" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Word Break.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx</id><published>2005-02-09T18:17:00Z</published><updated>2005-02-09T18:17:00Z</updated><content type="html">
The list-detail design is very commonly used with web pages, where you have a list of links that lead to more detailed information of each entry.  Sometimes the text of the list is simply a snippet of a much longer string of text in the detail.  A common way to handle this is to use a string function to return the first n characters of the string and display that in the list.  The problem with this is that tends to make the break right in the middle of a word. Which isn’t major problem but can be aesthetically displeasing or may accidentally form another word you didn’t mean to put on your site. 

When facing this issue I came up with a simple regex to allow me to break on whole words.

^(?:[ -~]{n,m}(?:$|(?:[\w!?.])\s))


Where n = the minimum number of characters to match
And m = the maximum number of character to allow in the match. 

Now in instance I’m considering a word to be one or more ACSII non white-space characters.  The way it works is after matching n ASCII characters it tries to match either the end of the string or a letter or sentence ending punctuation followed by a white space.  So it will accept as many characters, including white spaces as it can up to m and still satisfy the rest of the match.  Otherwise it backtracks until the regex is satisfied. So if you wanted a minimum of 2 characters and a maximum of 75 the regex would be

^(?:[ -~]{2,75}(?:$|(?:[\w!?.])\s))

and if you applied it the Gettysburg Address

“Four score and seven years ago, our fathers brought forth upon this continent a new nation: conceived in liberty, and dedicated to the proposition that all men are created equal.
…” (only the 1st paragraph shown for the example but you could apply the full text)

Taking the first match you get

“Four score and seven years ago, our fathers brought forth upon this ”


There are a few problems with the regex that can be improved.  First off it only accepts basic ASCII displayable characters, decimal 32 to 126 with mean the text must be in that range.  I did it this way because it give you the US alphabet, digits and commonly used symbols and punctuation which was all I needed at the time. Other characters would need to be added.  Also if the first word character count exceeds your maximum length no match will be found


 You can make this regex a little dynamic by putting inside a function that takes the your string, the max and min values as input.

&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;subject=Word+Break." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;title=Word+Break." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;title=Word+Break." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;title=Word+Break." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;title=Word+Break.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=324" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Making Dynamic XHTML pages valid with a Regex</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx</id><published>2005-01-26T21:47:00Z</published><updated>2005-01-26T21:47:00Z</updated><content type="html">&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;I have a website that I use primarily to expand and sharpen my web development skills.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;The latest effort in that regard had been writing valid markup.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Although there is some usable function to this site I use mainly for practice.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Consequently I don&amp;#8217;t work on it full time.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Instead work on it in spurts, with different editors, apply new scripts, new ideas and such at different times so the output is far from consistent.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;So I went to a lot of effort&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;to first make my HTML valid, the later convert that HTML to XHTML and make that valid.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;However being that the content of most of the pages is dynamically generated, when validating the XHTML it didn&amp;#8217;t occur to me right away that something in the content could make my page not validate in certain situations.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Case in point ampersands. According to the W3C (&lt;/FONT&gt;&lt;A href="http://www.w3.org/TR/xhtml1/#C_12"&gt;&lt;FONT face="Times New Roman"&gt;http://www.w3.org/TR/xhtml1/#C_12&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face="Times New Roman"&gt;) you should not have solitary ampersand characters in your markup.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Ampersands declare the beginning of a entity references.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;If you wanted to display the ampersand character you should use the &amp;amp; entity reference instead of (&amp;amp;).&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Keep in mind most browsers let you get away with incorrect usage so I didn&amp;#8217;t have do any to get my page to render but see as the whole point of writing valid mark up is not to depend on the browser fixing my sloppy code I didn&amp;#8217;t want to just let this go.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;The first time I encountered this problem on a page the fix was pretty simple.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;I was working with ASP so I simply wrapped ASP&amp;#8217;s Server.HTMLEncode method around the variables containing the database derived content.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Simple.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Problem solved.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Ummm not quite.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;The problem occurred on another page inside a dropdown list.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;My first thought to use the HTMLEncode method again was a horrible failure. Unlike my previous fix where I was pulling one row&amp;#8217;s column, let&amp;#8217;s say First Name, out of a database and sticking in a single variable for latter use, all the markup for the dropdown was being created in a separate function using &lt;?xml:namespace prefix = st2 ns = "urn:schemas-microsoft-com:office:smarttags" /&gt;&lt;st2:City&gt;&lt;st2:place&gt;ADO&lt;/st2:place&gt;&lt;/st2:City&gt;&amp;#8217;s Getstring method. So the variable containing the dropdown contained the full XHTML markup (the Select and option tags).&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;HTMLEncode turns the less than and greater than signs of the tags into entities which in turn basic displays your intended markup source code on the page.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Again the intent in to turn the solitary ampersands (&amp;amp;) into the &amp;amp;amp; entity. So I decide to use a regex to fix this problem. I modified this regex &lt;/FONT&gt;&lt;A href="http://www.regexlib.com/REDetails.aspx?regexp_id=626"&gt;&lt;FONT face="Times New Roman"&gt;http://www.regexlib.com/REDetails.aspx?regexp_id=626&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face="Times New Roman"&gt; which matches entities to this&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; tab-stops: 51.0pt"&gt;&lt;SPAN style="mso-tab-count: 1"&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;&amp;amp;(?!(?i:\#((x([\dA-F])&lt;?xml:namespace prefix = st1 ns = "isiresearchsoft-com/cwyw" /&gt;&lt;st1:citation&gt;{1,5}&lt;/st1:citation&gt;)|(104857[0-5]|10485[0-6]\d|1048[0-4]\d\d|104[0-7]\d&lt;st1:citation&gt;{3}&lt;/st1:citation&gt;|10[0-3]\d&lt;st1:citation&gt;{4}&lt;/st1:citation&gt;|0?\d&lt;st1:citation&gt;{1,6}&lt;/st1:citation&gt;))|([A-Za-z\d.]&lt;st1:citation&gt;{2,31}&lt;/st1:citation&gt;));)&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;FONT face="Times New Roman"&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Which matches ampersands which are not part of an entity.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;So using the replace pattern of &amp;#8220;&amp;amp;amp;&amp;#8221;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;I can replace just the solitary ampersand.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Now some of you might think this is overkill since I could use VBScript&amp;#8217;s Replace function to get the result I want.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;And in this very specific case that is true.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;There were only a handful of ampersands to replace.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;However this issue came up other places where a simple replace would not be easily implemented.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;The truth of the matter I didn&amp;#8217;t come up with this regex for this problem.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This was originally designed for dealing with XML files that contained solitary ampersands AND entity references.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;The VB or VBScript Replace won&amp;#8217;t work it that case.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;For example that the following line&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;&amp;lt;STATEMENT&amp;gt;(1&amp;nbsp;&amp;amp;lt; 2) &amp;amp; (1&amp;nbsp;&amp;amp;lt; 4) are both true &amp;lt;/STATEMENT&amp;gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt; 
&lt;font face="Times New Roman"&gt;using HTMLEncode you get&lt;/font&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="margin: 0in 0in 0pt; text-indent: 0.5in;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style=""&gt;&amp;nbsp;&lt;/span&gt;&amp;amp;lt;STATEMENT&amp;amp;gt; (1 &amp;amp;amp;lt; 2) &amp;amp;amp; (1 &amp;amp;amp;lt; 4) are both true &amp;amp;lt;/STATEMENT&amp;amp;gt;&lt;/font&gt;&lt;/p&gt;

&lt;p class="MsoNormal" style="margin: 0in 0in 0pt; text-indent: 0.5in;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="margin: 0in 0in 0pt; text-indent: 0.5in;"&gt;&lt;font face="Times New Roman"&gt;Using the VB replace function&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;you get &lt;/font&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="margin: 0in 0in 0pt; text-indent: 0.5in;"&gt;&lt;font face="Times New Roman"&gt;&amp;lt;STATEMENT&amp;gt;(1 &amp;amp;amp;lt; 2) &amp;amp;amp; (1 &amp;amp;amp;lt; 4) are both true &amp;lt;/STATEMENT&amp;gt;&lt;/font&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="margin: 0in 0in 0pt; text-indent: 0.5in;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="margin: 0in 0in 0pt; text-indent: 0.5in;"&gt;&lt;font face="Times New Roman"&gt;Neither being what you want.&lt;/font&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="margin: 0in 0in 0pt; text-indent: 0.5in;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="margin: 0in 0in 0pt; text-indent: 0.5in;"&gt;&lt;font face="Times New Roman"&gt;So the regex replace allows your content to contain entities and won&amp;#8217;t mess them up and you get the desired output&lt;/font&gt;&lt;/p&gt;

&lt;p class="MsoNormal" style="margin: 0in 0in 0pt; text-indent: 0.5in;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="margin: 0in 0in 0pt;"&gt;&lt;font face="Times New Roman"&gt;&amp;lt;STATEMENT&amp;gt;(1 &amp;amp;lt; 2) &amp;amp;amp; (1 &amp;amp;lt; 4) are both true &amp;lt;/STATEMENT&amp;gt;&lt;/font&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx&amp;amp;;subject=Making+Dynamic+XHTML+pages+valid+with+a+Regex" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx&amp;amp;;title=Making+Dynamic+XHTML+pages+valid+with+a+Regex" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx&amp;amp;title=Making+Dynamic+XHTML+pages+valid+with+a+Regex" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx&amp;amp;;title=Making+Dynamic+XHTML+pages+valid+with+a+Regex" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx&amp;amp;;title=Making+Dynamic+XHTML+pages+valid+with+a+Regex&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/01/26/323.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=323" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Higher plane solution</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx</id><published>2004-11-08T17:45:00Z</published><updated>2004-11-08T17:45:00Z</updated><content type="html">&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;I&amp;#8217;ve come up with a workaround to &lt;A href="http://regexadvice.com/blogs/mash/archive/2004/06/08/1247.aspx"&gt;Unicode plane bug&lt;/A&gt; I talk about earlier&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt; &lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Originally I was trying to identify Unicode values outside of the ASCII range, code point higher than 127, for a replace.&lt;SPAN style="mso-spacerun: yes"&gt;  &lt;/SPAN&gt;However the problem was characters in any other plane besides plane zero returned two matches per character.&lt;SPAN style="mso-spacerun: yes"&gt;  &lt;/SPAN&gt;Meaning I could get two incorrect replacements for those characters.&lt;SPAN style="mso-spacerun: yes"&gt;  &lt;/SPAN&gt;Turns out the replacement would just not taken place which was still a problem.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt; &lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;The following two regexes allowed me to get around this behavior.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="mso-tab-count: 1"&gt;            &lt;/SPAN&gt;Plane 0 excluding standard ASCII pattern&lt;SPAN style="mso-spacerun: yes"&gt;  &lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"&gt;&lt;A href="http://www.regexlib.com/REDetails.aspx?regexp_id=917"&gt;(?![\uD800-\uDBFF])(?![\uDC00-\uDFFF])[\u0080-\uFFFF]&lt;o:p&gt;&lt;/o:p&gt;&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="mso-tab-count: 1"&gt;            &lt;/SPAN&gt;Non-Plane 0 pattern &lt;/FONT&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"&gt;&lt;A href="http://www.regexlib.com/REDetails.aspx?regexp_id=918"&gt;[\uD800-\uDBFF][\uDC00-\uDFFF]&lt;o:p&gt;&lt;/o:p&gt;&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt; &lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;The ranges D800-DBFF and DC00-DFFF are the High and Low surrogate values respectfully. &lt;A href="http://www.i18nguy.com/unicode/surrogatetable.html"&gt;Surrogate code points are the values in UTF-16 encoding of the two 16-bit code units that make up a &lt;B&gt;Supplementary Character&lt;o:p&gt;&lt;/o:p&gt;&lt;/B&gt;&lt;/A&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt; &lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;You may notice in the plane 0 pattern&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;the surrogate pairs are in separate lookaheads.&lt;SPAN style="mso-spacerun: yes"&gt;  &lt;/SPAN&gt;You might think that they could be combined into one look ahead, but this won&amp;#8217;t work.&lt;SPAN style="mso-spacerun: yes"&gt;  &lt;/SPAN&gt;I&amp;#8217;m not exactly sure why this is, but doing so will get you a match on the low surrogate of a non-plane 0 character.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;&lt;/FONT&gt; &lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;I've posted these pattern to the regexlib.  Unfortunately most of the characters used in the examples can't be displayed on the site. The sample characters I used are&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Plane 0 : &amp;#9787;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Plane 1:  &amp;#66312;, &amp;#66376; and &amp;#66575;&lt;/FONT&gt;&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx&amp;amp;;subject=Higher+plane+solution" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx&amp;amp;;title=Higher+plane+solution" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx&amp;amp;title=Higher+plane+solution" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx&amp;amp;;title=Higher+plane+solution" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx&amp;amp;;title=Higher+plane+solution&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/11/08/322.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=322" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Flavor of the month</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx</id><published>2004-10-05T19:58:00Z</published><updated>2004-10-05T19:58:00Z</updated><content type="html">&lt;SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: 'Times New Roman'; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-bidi-language: AR-SA; mso-fareast-language: EN-US"&gt;It seem most regex engines based their implementation on what had been done in the PERL version, with some implementing a little more, others a little less.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This is what generally keeps regexes from being portable across different implementations.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;This often leads to confusion and syntax errors when you take a canned regex that was designed or tested in a different environment than it will be used.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;I&amp;#8217;ve talked before about &lt;A href="http://regexadvice.com/blogs/mash/archive/2004/07/13/1355.aspx"&gt;knowing the syntax &lt;/A&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;of the implementation the regex will be used to save yourself the trouble of trying to fix regexes that aren&amp;#8217;t incorrect but just written to be supported by a different engine.&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;There is a new testing tool on the Regexlib.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;It&amp;#8217;s pretty neat and a great improvement over the previous version.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;However it pretty much targeted for Microsoft regex engines.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Realizing not everyone is working in that environment and more importantly allowing myself (and others) a chance to play around with regular expressions using syntax not currently supported correctly or at all by those engines,&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;I&amp;#8217;ve found several online regex testers for a variety of languages and platforms.&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;Environment/Language&lt;SPAN style="mso-tab-count: 1"&gt;&amp;nbsp; &lt;/SPAN&gt;Site&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;Microsoft(.Net, VBScript ,Jscript)&lt;SPAN style="mso-tab-count: 2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;A href="http://www.regexlib.com/ReTester.aspx"&gt;http://www.regexlib.com/ReTester.aspx&lt;/A&gt; (currently requires IE for VBScript and Jscript testing)&lt;/P&gt;&lt;/DIV&gt;
&lt;LI&gt;
&lt;DIV class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;Javascript&lt;SPAN style="mso-tab-count: 1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;A href="http://www.unusualresearch.com/regex/toolrei.htm"&gt;http://www.unusualresearch.com/regex/toolrei.htm&lt;/A&gt;&amp;nbsp;or &lt;A href="http://www.regular-expressions.info/javascriptexample.html"&gt;http://www.regular-expressions.info/javascriptexample.html&lt;/A&gt;&lt;/P&gt;&lt;/DIV&gt;
&lt;LI&gt;Java 
&lt;OL&gt;
&lt;LI&gt;
&lt;DIV class=MsoNormal style="MARGIN: 0in 0in 0pt" align=left&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;Jakarta ORO(Perl5, AWL, GLOB)&lt;SPAN style="mso-tab-count: 1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;A href="http://jakarta.apache.org/oro/demo.html"&gt;http://jakarta.apache.org/oro/demo.html&lt;/A&gt;&lt;/P&gt;&lt;/DIV&gt;
&lt;LI&gt;
&lt;DIV class=MsoNormal style="MARGIN: 0in 0in 0pt" align=left&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: 'Times New Roman'; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-bidi-language: AR-SA; mso-fareast-language: EN-US"&gt;Jakarta Regexp&lt;SPAN style="mso-tab-count: 1"&gt; &lt;/SPAN&gt;&lt;A href="http://jakarta.apache.org/regexp/applet.html"&gt;http://jakarta.apache.org/regexp/applet.html&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/DIV&gt;
&lt;LI&gt;
&lt;DIV class=MsoNormal style="MARGIN: 0in 0in 0pt" align=left&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;java.utitl.regex &lt;A href="http://www.fileformat.info/tool/regex.htm"&gt;http://www.fileformat.info/tool/regex.htm&lt;/A&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;/OL&gt;
&lt;OL&gt;&lt;/OL&gt;
&lt;LI&gt;
&lt;DIV class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;PHP (preg, ereg) &lt;A href="http://www.quanetic.com/regex.php"&gt;http://www.quanetic.com/regex.php&lt;/A&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;Please refer to each site&amp;#8217;s documentation (if there is any) on what features and advanced syntax is supported. But here is a quick highlight of something one site supports that the other may or may not.&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;.Net = Named Groups and Look-Behinds&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;ORO = POSIX syntax&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;java.utitl.regex = Character Class Subtraction, Possessive quantifiers&lt;o:p&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&amp;nbsp;&lt;/P&gt;&lt;/SPAN&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx&amp;amp;;subject=Flavor+of+the+month" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx&amp;amp;;title=Flavor+of+the+month" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx&amp;amp;title=Flavor+of+the+month" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx&amp;amp;;title=Flavor+of+the+month" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx&amp;amp;;title=Flavor+of+the+month&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/321.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=321" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>JScript Bug</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx</id><published>2004-10-05T15:26:00Z</published><updated>2004-10-05T15:26:00Z</updated><content type="html">&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT size=2&gt;I saw a comment on the Regexlib that expression &lt;/FONT&gt;&lt;/P&gt;&lt;PRE&gt;&lt;FONT size=2&gt;&lt;SPAN style="mso-tab-count: 1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;A href="http://www.regexlib.com/REDetails.aspx?regexp_id=598"&gt;^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15}$&lt;/A&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;/SPAN&gt;&lt;PRE&gt;&lt;FONT size=2&gt;didn&amp;#8217;t work for JavaScript.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;FONT size=2&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;I responded mainly because the alternative offered didn&amp;#8217;t wasn&amp;#8217;t really an equivalent expression.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;But when I looked at the original I really didn&amp;#8217;t see why it shouldn&amp;#8217;t work in JavaScript and there is any. This does work fine in JavaScript.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;However it DOES NOT work in Jscript (or VBScript).&lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;FONT size=2&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;FONT size=2&gt;This expression is suppose to match a string of 8 to 15 characters containing at least one digit and one lowercase and one uppercase English letter. &lt;/FONT&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;FONT size=2&gt;The suggested use is for password requirements.&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/PRE&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;There appears to a bug with the client-side Microsoft regex engines.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Here&amp;#8217;s what&amp;#8217;s expected to happen.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Each look-ahead test is executed one at a time.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-INDENT: 0.5in"&gt;&lt;FONT face="Times New Roman"&gt;1) Is there a digit in the string?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-INDENT: 0.5in"&gt;&lt;FONT face="Times New Roman"&gt;2) Is there a lowercase letter in the string? &lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-INDENT: 0.5in"&gt;&lt;FONT face="Times New Roman"&gt;3) Is there a uppercase letter in the string?&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;If all three are true the final test for the string being 8 to 15 characters is executed.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;If this is also true we have a match.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This is how the engine works in .Net (and other non-Microsoft regex engines)&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Here is what&amp;#8217;s happening in with the client-side (VBScript/Jscript) Microsoft regex engines.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.75in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list .75in"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="mso-list: Ignore"&gt;1)&lt;SPAN style="FONT: 7pt 'Times New Roman'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;Is there a digit, followed by 7 to 14 characters&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.75in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list .75in"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="mso-list: Ignore"&gt;2)&lt;SPAN style="FONT: 7pt 'Times New Roman'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;Is there a lowercase English letter, followed by 7 to 14 characters&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt 0.75in; TEXT-INDENT: -0.25in; mso-list: l0 level1 lfo1; tab-stops: list .75in"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN style="mso-list: Ignore"&gt;3)&lt;SPAN style="FONT: 7pt 'Times New Roman'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;Is there a uppercase English letter, followed by 7 to 14 characters&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;After one look-ahead finds the character it&amp;#8217;s looking for it tries to satisfy the .{8,15} portion of the regex.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;It does this for each look-ahead making the regex rather sporadic in its matching. Using the &lt;A href="http://www.regexlib.com/RETester.aspx"&gt;Regexlib's testing tool&lt;/A&gt; while 123456aB7 should match , like it does in .Net, it&amp;nbsp;doesn&amp;#8217;t because the letters aren&amp;#8217;t follow 7 more characters after the lowercase and 7 after the uppercase.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;However aB12345678 matches because there are 7 characters (but not the same 7) after each letter case and the digit.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;So in effect this forces the input to be at least 10 characters because the client-side evaluation isn&amp;#8217;t possible in fewer characters.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;And all the look-ahead conditions must be satisfied by&amp;nbsp; the 8&lt;SUP&gt;th&lt;/SUP&gt; character to the left of the end of the string.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx&amp;amp;;subject=JScript+Bug" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx&amp;amp;;title=JScript+Bug" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx&amp;amp;title=JScript+Bug" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx&amp;amp;;title=JScript+Bug" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx&amp;amp;;title=JScript+Bug&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=320" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author><category term="Bugs/Quirks" scheme="http://regexadvice.com/blogs/mash/archive/tags/Bugs_2F00_Quirks/default.aspx" /></entry><entry><title>Switched at birth</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx</id><published>2004-09-13T15:28:00Z</published><updated>2004-09-13T15:28:00Z</updated><content type="html">&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Seem there is another bug in the .Net regex engine concerning Unicode&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;The characters matched by the Unicode categories for quotes are backwards.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; TEXT-INDENT: 0.5in"&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Pi &amp;#8211; &lt;I&gt;InitialQuotePunctuation&lt;/I&gt; Indicates that the character is an opening or initial quotation mark. Signified by the Unicode designation "Pi" ( punctuation, initial quote ).&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;Pf &amp;#8211; &lt;I&gt;FinalQuotePunctuation&lt;/I&gt; Indicates that the character is a closing or final quotation mark. Signified by the Unicode designation "Pf" ( punctuation, final quote )&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;However \p{Pi} will match a closing quote and \p{Pf} will match an opening quote. This is backwards.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&lt;FONT face="Times New Roman"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Times New Roman"&gt;I should note the quote characters in question are the Unicode left and right &lt;SPAN style="COLOR: red"&gt;&amp;#8216;&lt;/SPAN&gt;single&lt;SPAN style="COLOR: red"&gt;&amp;#8217;&lt;/SPAN&gt; and &lt;SPAN style="COLOR: red"&gt;&amp;#8220;&lt;/SPAN&gt;double&lt;SPAN style="COLOR: red"&gt;&amp;#8221;&lt;/SPAN&gt; quotation marks not the ASCII double quotes (&lt;SPAN style="COLOR: red"&gt;"&lt;/SPAN&gt;) or apostrophe (&lt;SPAN style="COLOR: red"&gt;'&lt;/SPAN&gt;)&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx&amp;amp;;subject=Switched+at+birth" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx&amp;amp;;title=Switched+at+birth" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx&amp;amp;title=Switched+at+birth" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx&amp;amp;;title=Switched+at+birth" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx&amp;amp;;title=Switched+at+birth&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2004/09/13/319.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=319" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author><category term="Bugs/Quirks" scheme="http://regexadvice.com/blogs/mash/archive/tags/Bugs_2F00_Quirks/default.aspx" /></entry></feed>