<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://regexadvice.com/utility/FeedStylesheets/atom.xsl" media="screen"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><title type="html">Michael Ash's Regex Blog</title><subtitle type="html">Regex Musings</subtitle><id>http://regexadvice.com/blogs/mash/atom.aspx</id><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/default.aspx" /><link rel="self" type="application/atom+xml" href="http://regexadvice.com/blogs/mash/atom.aspx" /><generator uri="http://communityserver.org" version="2.1.60809.935">Community Server</generator><updated>2005-02-09T13:17:00Z</updated><entry><title>Full Circle - JavaScript Date validation with regex</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2011/11/01/Full-Circle-_2D00_-JavaScript-Date-validation-with-regex.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2011/11/01/Full-Circle-_2D00_-JavaScript-Date-validation-with-regex.aspx</id><published>2011-11-01T06:25:00Z</published><updated>2011-11-01T06:25:00Z</updated><content type="html">Several year ago I decided to learn Regular Expressions. I had been skimming over an article about using regex for validation on web pages. It was at least a couple of weeks before I sat down an read the article in full. After I did I wanted to try to write a regex on my one but couldn&amp;#39;t think of anything original. Eventually I wrote a date regex that validated leap years as well, something none of the other regex I had seen at the time did. In the spirit of the article that introduced regular...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2011/11/01/Full-Circle-_2D00_-JavaScript-Date-validation-with-regex.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=84156" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author><category term="Dates" scheme="http://regexadvice.com/blogs/mash/archive/tags/Dates/default.aspx" /><category term="Leap year" scheme="http://regexadvice.com/blogs/mash/archive/tags/Leap+year/default.aspx" /><category term="Regex" scheme="http://regexadvice.com/blogs/mash/archive/tags/Regex/default.aspx" /><category term="JavaScript" scheme="http://regexadvice.com/blogs/mash/archive/tags/JavaScript/default.aspx" /></entry><entry><title>Got (X)HTML? Use the DOM</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2011/01/23/Regex-and-HTML.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2011/01/23/Regex-and-HTML.aspx</id><published>2011-01-24T03:04:00Z</published><updated>2011-01-24T03:04:00Z</updated><content type="html">By far the most common request I see in people wanting regex help is someone wanting to use a regex to parse HTML. Generally I ignore those questions. If I do respond, my response is &amp;ldquo;Don&amp;rsquo;t use a regex to parse HTML. Use the HTML DOM&amp;rdquo; Same is true for XML, with the XML DOM, but doing it for HTML is even worse. Not that my advice stops people from trying to do it anyway but giving them proper warning I feel no remorse in the self-inflicted pain they illicit ignoring my advice. You...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2011/01/23/Regex-and-HTML.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=77190" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author><category term="Regex" scheme="http://regexadvice.com/blogs/mash/archive/tags/Regex/default.aspx" /><category term="HTML" scheme="http://regexadvice.com/blogs/mash/archive/tags/HTML/default.aspx" /><category term="DOM" scheme="http://regexadvice.com/blogs/mash/archive/tags/DOM/default.aspx" /></entry><entry><title>Grading the Guidelines.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2010/04/10/Grading-the-Guidelines_2E00_.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2010/04/10/Grading-the-Guidelines_2E00_.aspx</id><published>2010-04-10T06:28:00Z</published><updated>2010-04-10T06:28:00Z</updated><content type="html">Over at the Construction forum there is a sticky post which list the suggested posting guidelines for asking question on that forum. First let me say the guidelines were my idea however I didn&amp;#39;t come up with them on my own. I had help. The idea sprang from the fact I was getting tired of repeatedly having to as each poster for the same bits of information. So the idea was to put these standard questions in a central highly visible location everyone could see. Where new poster could see what would...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2010/04/10/Grading-the-Guidelines_2E00_.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=63566" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Looking again at the Lookahead bug</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2009/02/21/Looking-again-at-the-Lookahead-bug.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2009/02/21/Looking-again-at-the-Lookahead-bug.aspx</id><published>2009-02-21T19:05:00Z</published><updated>2009-02-21T19:05:00Z</updated><content type="html">A few years ago a wrote about about a bug in Internet Explorer&amp;#39;s Regex engine that affected patterns with lookaheads. Well the bug came back in the form of a question on RegexAdvice.com. It too was a password regex, though not as complex as the previous pattern that introduced me to this bug. The first pattern had three conditions that were being tested for with lookaheads. ^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15}$ With the current pattern only one lookahead was being used. ^(?=.*?\d)[a-z][a-z0-9]{5,7}$...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2009/02/21/Looking-again-at-the-Lookahead-bug.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=51059" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author><category term="Bugs/Quirks" scheme="http://regexadvice.com/blogs/mash/archive/tags/Bugs_2F00_Quirks/default.aspx" /></entry><entry><title>Validating Email Revisited </title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2008/09/28/Validating-Email-Revisited-.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2008/09/28/Validating-Email-Revisited-.aspx</id><published>2008-09-28T05:20:00Z</published><updated>2008-09-28T05:20:00Z</updated><content type="html">First off let me say I&amp;#39;m a bit over my head here. Not regex part but host the language of the regex engine. Many moons ago I posted a blog article stating why you could not write a regex that validated an e-mail address 100% . Well this is still true, however in that posted I also stated that the pattern was so massive that it wasn&amp;#39;t worth using. This is also still true however I was made aware of a flavor-specific syntax that reduces the regex from massive to very large. This regex is for...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2008/09/28/Validating-Email-Revisited-.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=46728" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author><category term="Email" scheme="http://regexadvice.com/blogs/mash/archive/tags/Email/default.aspx" /><category term="PCRE" scheme="http://regexadvice.com/blogs/mash/archive/tags/PCRE/default.aspx" /></entry><entry><title>Update to CSS Minification</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2008/04/27/Update-to-CSS-Minification.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2008/04/27/Update-to-CSS-Minification.aspx</id><published>2008-04-27T22:06:00Z</published><updated>2008-04-27T22:06:00Z</updated><content type="html">This is a C# 2.0 enhancement of a C# port of YUI Compressor &amp;#39;s CSS minification code I got a little carried away with ideas for this, they were all regex based which really is what motivated me to work on it. However after I thought I was done I learned not everything worked. It did what I wanted it to do but what I wanted wasn&amp;#39;t the correct thing. I really should have just stopped with my original ideas. The last idea for my original changes was to take 2 or more individual subset properties...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2008/04/27/Update-to-CSS-Minification.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=41665" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Follow up to Additional CSS minifying regex patterns</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2008/04/18/Follow-up-to-Additional-CSS-minifying-regex-patterns.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2008/04/18/Follow-up-to-Additional-CSS-minifying-regex-patterns.aspx</id><published>2008-04-18T20:04:00Z</published><updated>2008-04-18T20:04:00Z</updated><content type="html">OK, there regexes were discussed in the previous post this is mostly just their application. This is a C# 2.0 enhancement of a C# port of YUI Compressor &amp;#39;s CSS minification code Since I was doing this is C# I took full advantage of it&amp;#39;s regex engine, namely using lookbehinds and delegates for some replaces. Almost all the regexes after the &amp;quot;New Test&amp;quot; comment are the new or modified regexes from the ported version. There is also one new and two modified expressions before that comment....(&lt;a href="http://regexadvice.com/blogs/mash/archive/2008/04/18/Follow-up-to-Additional-CSS-minifying-regex-patterns.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=41530" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Additional CSS minifying regex patterns</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2008/03/27/Additional-CSS-minifying-regex-patterns.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2008/03/27/Additional-CSS-minifying-regex-patterns.aspx</id><published>2008-03-27T20:55:00Z</published><updated>2008-03-27T20:55:00Z</updated><content type="html">NOTE: All the regex referenced on this page written by me are using IgnoreCase = true I was looking at the regexes used in the YUI Compressor to minify CSS and came up with a couple of more that I think could help the process. The code and port I was looking at was already trimming unneeded zeros used for the top, right, bottom, left values with a simple string replace. But there were three separate replaces being done. It was pretty simple to come up with a regex to handle all the cases (Pseudo...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2008/03/27/Additional-CSS-minifying-regex-patterns.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=40757" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>A touch of Character Class</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2008/01/31/A-touch-of-Character-Class.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2008/01/31/A-touch-of-Character-Class.aspx</id><published>2008-01-31T16:49:00Z</published><updated>2008-01-31T16:49:00Z</updated><content type="html">The square brackets character class is one of the more misunderstood of the basic regex features. This feature is supported in virtually all regex implementations. In fact off the top of my head I don&amp;#39;t know an implementation that doesn&amp;#39;t support it. Maybe it&amp;#39;s not well documented in most tutorials or maybe the samples are not clear enough or maybe users are just skimming over the details for this one but I see this feature misused quite often. This type of character class simply matches...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2008/01/31/A-touch-of-Character-Class.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=39143" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Remember where you come from</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2007/10/01/Remember-where-you-come-from.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2007/10/01/Remember-where-you-come-from.aspx</id><published>2007-10-01T06:42:00Z</published><updated>2007-10-01T06:42:00Z</updated><content type="html">One thing I&amp;#39;ve noticed among rookie regex users is that they focus way too much on what they are trying to match and not on where they are trying to match it from. There is a tenancy to drastically under estimate the importance of their source data, which they are trying to apply their regex against. I see this all the time on forums where posters asking for help almost never post any sample data, and most of the few that do post a sample they made up on the spot. Over on they RegexAdvice construction...(&lt;a href="http://regexadvice.com/blogs/mash/archive/2007/10/01/Remember-where-you-come-from.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=35277" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Are you ready for regex?</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx</id><published>2007-06-01T17:02:00Z</published><updated>2007-06-01T17:02:00Z</updated><content type="html">
&lt;p style="margin-bottom:0in;"&gt;Who should be using regular
expressions?&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;This has been on my mind for a while,
there are some people who shouldn&amp;#39;t be using regular expressions. People
who don&amp;#39;t know what regular expressions are for.[:^)]  Now you can use
regexes in a variety of ways, and you can debate either side on
whether certain applications are good uses or bad uses, but that&amp;#39;s
not what I&amp;#39;m talking about.  I&amp;#39;m not even talking about people who
don&amp;#39;t know how to write  regexes, well or at all.  I&amp;#39;m talking about
people trying to use regular expressions but have no idea what
regular expressions do.  And I don&amp;#39;t mean that you can&amp;#39;t decipher a
regex pattern by looking at it.  I mean you don&amp;#39;t understand the
general concept that regular expressions &lt;b&gt;&lt;u&gt;match&lt;/u&gt;&lt;/b&gt;&lt;span&gt;&lt;span style="text-decoration:none;"&gt;
patterns in data.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;I&amp;#39;ve seen several post over on the
regexadvice.com forums in the past few months which go something like
&amp;ldquo;I have this problem someone told me I should use a regular
expression&amp;rdquo; but unless that same someone bother to explain what a
regular expressions does or why it is suited for your task, don&amp;#39;t be
in such a hurry to plug in a regex.&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;  I&amp;#39;m not saying you need to master
regex before using them just get your head around the basics
(matching).  At the very least you should associate regex to wildcard
searches, just more powerful. If you are at least that far, proceed
with trying to implement your regex.  But if you are thinking
something completely different slow down there.  Unless you have a
basic understanding of what regular expressions do, even simple tasks
become way more difficult than they should be.  Even if you get
someone to help you write a regex and still don&amp;#39;t know what it&amp;#39;s
doing you are likely to continue to incorrectly try to use them. 
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;A common result from this lack
understanding I&amp;#39;ve been seeing is trying to perform a task solely via
writing a regex pattern, that regexes themselves have no capacity to
perform. Not that a regex couldn&amp;#39;t be a part of the solution but in
the end it&amp;#39;s not even what will do what the person wants done,
usually it a coding task, but the regex can at least do some of the
prep work of the task.  Even if some implementations of the regex
engine allow code to be called, it is still the code doing the heavy
lifting.  The regex is only finding the data.   One could save
themselves hours of wasted time if the just understand the basics of
what they are getting themselves into.&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;Another thing I&amp;#39;ve seen, related to the
not having a basic understanding, is a task that is not only
perfectly suited for using a regex but  a very common application of
a regex task and the person asking the question asking &amp;ldquo;Is this
some that a regex can do?&amp;rdquo;  Of  the issues I&amp;#39;ve mention this one is
the most and the least understandable.  The most because regular
expressions are not simple to everyone, to some it comes easy to
others it never comes completely. And some of the documentation isn&amp;#39;t
the greatest, though there is plenty of good documentation out there these days.  So I can understand why some can&amp;#39;t get their head
about  exactly how to write a regex.  But it the least understandable
when they are asking how to write a regex so common, and usually
simple, that dozens of tutorials and/or articles on regexes use the
very regex they are asking for as an example.&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;  I think the main problems of people
who fall into this category is 
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;&lt;p style="margin-bottom:0in;"&gt;Lack of research.  There are more
	than enough tutorials out on the web and more than a few books that
	have simple samples to give you an idea of what a regex does. I find
	it very hard to believe when someone says they couldn&amp;#39;t find a regex
	for basic (US) zip codes. Did you even look? Or are you only using a
	regex for this task because someone told you that you should and ran out to find someone to write it for you? or are you just cheating on your homework? 
	&lt;/p&gt;
	&lt;/li&gt;&lt;li&gt;&lt;p style="margin-bottom:0in;"&gt;Misunderstanding what regular
	expressions understand.  There seems to be more than a few people
	who think a regex understands the context of the data it matches. 
	That it not only know how to match the data but it know what it
	means, either in context to their application or to the world at
	large. Those are the people that think a regex  for matching zip codes
	know what zip codes are used for and where they would be used. 
	Sorry but that&amp;#39;s not the case.  Regexes understand nothing of what
	your data means to you.  As far as the regex engine is concerned
	it&amp;#39;s just a string of characters.  It&amp;#39;s up to the regex author know
	the context of the data they want to match and to shape the regex
	accordingly to return only the relevant data.  This may make it seem
	like the regex understands the data it&amp;#39;s matching but that not what
	happening.  What happen is the person who wrote the regex understood
	the data and the problem set so well they were able to construct a
	regex the only match the relevant values.&lt;/p&gt;
	&lt;/li&gt;&lt;li&gt;&lt;p style="margin-bottom:0in;"&gt;That regex is a full blown
	programming language.  It&amp;#39;s not.  I&amp;#39;ve seen questions about wanting
	a regex to compare numbers, tell time or do some other function
	completely outside their realm but something most programming
	languages have a feature to deal with or let you write code that
	can.  I&amp;#39;ve never seen any regex documentation promoting such
	features so I can&amp;#39;t imagine why someone thinks a regex can perform
	these task.  Other than my previous point where they saw the results
	of a well written regex and speculated on what and how much work the
	regex did. Like I mentioned some implementation allow you to perform
	function calls but that is more an add-on of the programming
	environment you are using than a generic regex feature.  Realize
	that regular expressions are one of many features of your
	programming language, not the other way around. 
	&lt;/p&gt;
&lt;/li&gt;&lt;/ol&gt;
&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;p style="margin-bottom:0in;"&gt;If you&amp;#39;ve read this far and you didn&amp;#39;t
know what regular expressions did or didn&amp;#39;t do before hopefully by
now you have some idea.  And if you already knew what they did just
stop to consider the next time you advise someone to use a regex, you
make sure you get across the high level point that you are &amp;ldquo;matching
something (a character pattern) in a string&amp;rdquo; before you get into
the more complex aspects of what a regex can do.&lt;/p&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;subject=Are+you+ready+for+regex%3f" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;title=Are+you+ready+for+regex%3f" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;title=Are+you+ready+for+regex%3f" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;title=Are+you+ready+for+regex%3f" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx&amp;amp;;title=Are+you+ready+for+regex%3f&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/Are-you-ready-for-regex_3F00_.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=30086" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>You've got your sub-matches in my matches</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx</id><published>2007-06-01T16:47:00Z</published><updated>2007-06-01T16:47:00Z</updated><content type="html">
&lt;p style="margin-bottom:0in;"&gt;Hello boys and girls.  Wow it&amp;#39;s been a
while since I&amp;#39;ve done this.  I want to touch on a very useful but
often overlooked feature of regex, grouping.  While I haven&amp;#39;t been
blogging I have been active on a message board here or there.  A
question I see quite often is &amp;ldquo;I want to find a match in a string
but I don&amp;#39;t want part of the match&amp;rdquo; or &amp;ldquo;I need the value of this
portion of the string&amp;rdquo;  Now often I see solutions to these type of
questions that involve look-arounds.  Any why they certainly work
they aren&amp;#39;t the only way to achieve the desired results. New regex
users seem to believe they can only access the full match.  Most
regex engine support groups, where in a match you can access a
certain portion of the full match.  Groups are identified by
parenthesis.  Every pair of plain parenthesis is a group.  For
example the regex pattern&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;&lt;code&gt;/&lt;span style="color:green;"&gt;(Hello)\x20(world)&lt;/span&gt;/&lt;/code&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;There are two groups in the regex.  The
regex itself matches the string &amp;ldquo;Hello world&amp;rdquo;, group 1 contains
the string &amp;ldquo;Hello&amp;rdquo;, group 2 contains the string &amp;ldquo;world&amp;rdquo;. 
Note neither group contains the space between the two words but it is
part of the full match.  Most implementations of regular expressions
allow you a ways to access these groups.  They are contained in a
collection inside the match object.  Now you&amp;#39;ll need to consult your
regex documentation to know the exact layout but in most of these
implementation there is a zero based collection where Item 0 is the
full match and Item 1..n is whichever group element your regex
contains, if any.&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;"&gt;Now if you notice I said every pair of
&lt;u&gt;plain&lt;/u&gt;&lt;span style="text-decoration:none;"&gt; parenthesis is a
group.  The reason I stress plain is because there are other group
constructs, the aforementioned look-arounds being some. Now support
for the other constructs vary in implementations so again consult
your regex documentation to see which ones you have. The other
grouping constructs consist of a open parenthesis immediately
followed by a question mark, which then is followed by the characters
that define that particular grouping construct.  Again consult your
documentation to see which characters define what.  I&amp;#39;m not going to
go into all of them here but they basically fall into two categories.
 Capturing and non-Capturing.  The plain parenthesis I&amp;#39;ve mentioned
above are a capturing group.  However capturing requires extra
resources in some case you need the extra speed,  but you still to
group a certain part of the pattern together for either necessity or
readability or both. This is where you&amp;#39;ll what to using a
non-capturing group (?:pattern), a open parenthesis followed
immediately by a question mark followed immediately by a colon.  The
difference here is that the data matched in the group is not add to
the collection of submatch in the Match object.&lt;/span&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Taking our
previous example and making the first group non-capturing&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;code&gt;/&lt;span style="color:green;"&gt;(?:Hello)\x20(world)&lt;/span&gt;/&lt;/code&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Where before we
had two groups here we only have one.  Group 1 contains the
string&amp;rdquo;world&amp;rdquo;.  Now this example is not a very practical use of a
non-capturing group.  Typically you&amp;#39;d use them in more complex
regexes that have a grouping but you really don&amp;#39;t care about the
sub-matches.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;One more quick
thing about capturing groups basically each left parenthesis is the
index of the sub-match in the groups collection.  So if you have
nested parenthesis count every (plain) left parenthesis to know which
index to use to reference it. Some of the advance grouping constructs
and regex options can affect the ordering but if you are using them
hopefully you&amp;#39;ve read their effects so I won&amp;#39;t go over that here.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;code&gt;/&lt;span style="color:green;"&gt;((Hello)|(Goodbye
Cruel))\x20(world)&lt;/span&gt;/&lt;/code&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;The above regex
has 4 capturing groups (not counting group 0). Can you find them? 
Now it should match either the string &amp;ldquo;Hello world&amp;rdquo; or &amp;ldquo;Goodbye
Cruel world&amp;rdquo;  Now I want to point out that not all the groups will
participate in the match, but the are still part of the Groups
collection.  There will always be 4 groups, just one will always be
empty.  Which one depends on which string was matched.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;If &amp;ldquo;Hello
world&amp;rdquo; was matched the groups are&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Hello&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Hello&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;(empty)&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;world&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;If &amp;ldquo;Goodbye
Cruel world&amp;rdquo; was matched the groups are&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Goodbye
	Cruel&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;(empty)&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Goodbye
	Cruel&lt;/p&gt;
	&lt;/li&gt;
&lt;li&gt;
&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;world&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;in both case
group 0 would be the full match&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;If you note in
both case two groups contain the same value. Even if you need to know
whether &amp;ldquo;Hello&amp;rdquo; or &amp;ldquo;Goodbye Cruel&amp;rdquo; was match, you certainly
don&amp;#39;t need to know it twice.  Plus the inner parenthesis have
different index you&amp;#39;d have to check if you want to use those.  This
is where you&amp;#39;d use the non-capturing group to simplify your groups
collection.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;code&gt;/&lt;span style="color:green;"&gt;((?:Hello)|(?:Goodbye
Cruel))\x20(world)&lt;/span&gt;/&lt;/code&gt;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;Now we are back
down to two groups.  Group 1 contains either &amp;ldquo;Hello&amp;rdquo; or &amp;ldquo;Goodbye
Cruel&amp;rdquo; depending on which string was matched.  Group 2 always
contains &amp;ldquo;world&amp;rdquo;&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt; However keep in
mind in some cases you&amp;#39;ll want to use the inner index do determine
which group was matched.   So using non-capturing groups isn&amp;#39;t
necessarily a better thing it just depends on if you need to access
those groups or not. But if you are not doing anything with them
don&amp;#39;t capture them.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;These are just
two of the basic grouping constructs and they are general supported
across implementations of regex, but not always. But if they are you
can use the to easily dissect larger matches.&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;

&lt;p style="margin-bottom:0in;text-decoration:none;"&gt;&lt;br /&gt;
&lt;/p&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;subject=You%27ve+got+your+sub-matches+in+my+matches" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;title=You%27ve+got+your+sub-matches+in+my+matches" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;title=You%27ve+got+your+sub-matches+in+my+matches" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;title=You%27ve+got+your+sub-matches+in+my+matches" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx&amp;amp;;title=You%27ve+got+your+sub-matches+in+my+matches&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=30085" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Named Groups to the Rescue</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx</id><published>2005-09-28T20:06:00Z</published><updated>2005-09-28T20:06:00Z</updated><content type="html">


&lt;p class="MsoNormal"&gt;I was asked to modify some text that had been built
incorrectly. Basically insert some text at a certain point. First I use a regex
to find the text, then insert the new value within that match. &lt;span&gt;&amp;nbsp;&lt;/span&gt;Now since the inserted value goes inside the
matched text I simply wanted use backreferences and the replace method.&lt;span&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;Simple right?&lt;span&gt;&amp;nbsp;
&lt;/span&gt;Well not so much.&lt;o:p&gt;&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;Now the text is in&amp;nbsp; a field of various rows of a database table and the
text to be inserted comes from another of the fields in the same row and is an
alphanumeric value.&lt;span&gt;&amp;nbsp; &lt;/span&gt;So the inserted text
value is dynamic, so I can’t simply hard code the replacement text. So the
replacement text is built dynamically for each row.&lt;span&gt;&amp;nbsp; &lt;/span&gt;The text to be modified is a certain
attribute somewhere in the text.&lt;span&gt;&amp;nbsp; &lt;/span&gt;For this
example lets say it’s “id=xyz”, which is constant for all records.&lt;span&gt;&amp;nbsp; &lt;/span&gt;Now the new text will be inserted right after
the equals sign.&lt;/p&gt;


&lt;p class="MsoNormal"&gt;So for&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br&gt;
 &lt;/span&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;source&lt;span&gt;&amp;nbsp; &lt;/span&gt;=“ {some stuff}
id=xyz {more stuff}”&lt;br&gt;
newText = “ab1”&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;br&gt;
you get&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;br&gt;
“{some stuff} id=&lt;font color="#ff0000"&gt;ab1&lt;/font&gt;xyz {more stuff}”&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Simple enough.&lt;span&gt;&amp;nbsp; &lt;/span&gt;You
use this regex &lt;span&gt;&amp;nbsp;&lt;/span&gt;\bid=xyz\b to match the
text. Then split it in to groups so you can use backreferences in the
replace.&lt;span&gt;&amp;nbsp; &lt;/span&gt;So your final regex looks like
this:&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;\b(bid=)(xyz)\b &lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;br&gt;
Now group 1 contain the text up to your insertion point
(id=)&lt;/p&gt;


&lt;p class="MsoNormal"&gt;And group2 contains the text after your insertion point (xyz)&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;So your replacement string for your regex is “$1&lt;font color="#008000"&gt;(new data
goes here)&lt;/font&gt;$2)” , where &lt;font color="#008000"&gt;(new data goes here)&lt;/font&gt; = some alphanumeric value pulled
from a second field in a row.&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;br&gt;
Doing this is in .Net my code looked something like this pseudo-code&lt;/p&gt;




&lt;p class="MsoNormal"&gt;Regex regexFind = new Regex(“\b(bid=)(xyz)\b”);&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Get Records&lt;/p&gt;


&lt;p class="MsoNormal"&gt;For each row&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt; fieldA
= rowFieldA&lt;span&gt;&amp;nbsp; &lt;/span&gt;(source text)&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt; fieldB
= rowFieldB (insert value)&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;fieldA =
regexFind.Replace(FieldA,String.Format(“$1{0}$2”,fieldB))&lt;/p&gt;


&lt;p class="MsoNormal"&gt;next&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;The Format method of the string create a replacement string
for each row.&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Look good?&lt;span&gt;&amp;nbsp; &lt;/span&gt;Works find
for our example but there is a problem.&lt;/p&gt;




&lt;p class="MsoNormal"&gt;For our example value of “ab1”&lt;span&gt;&amp;nbsp; &lt;/span&gt;the string format produces “$1ab1$2” which is
exactly what we want, but as this field is alphanumeric so it could begin with a
number which causes a problem.&lt;span&gt;&amp;nbsp; &lt;/span&gt;Say
for the next record the value of the text to be inserted is “12a” the format
method produces a replacement string of “$112a$2”, which is not good.&lt;span&gt;&amp;nbsp; &lt;/span&gt;Syntactically it’s fine but it’s not what we
want, because instead of trying to inserts some text between group 1 and 2,
which is what we want to do, it is trying to insert text between group 112 and
group 2.&lt;span&gt;&amp;nbsp; &lt;/span&gt;As there is no group 112 it
assumes $112 is literal text so your final result is “id=$112axyz”&lt;/p&gt;




&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;Ok this is where named group become handy (necessary?).&lt;span&gt;&amp;nbsp; &lt;/span&gt;If you used name groups in your regex and
replacement string you can avoid this problem&lt;/p&gt;






&lt;p class="MsoNormal"&gt;Change the regex to \b(&lt;font color="#ff0000"&gt;?&amp;lt;att&amp;gt;&lt;/font&gt;bid=)(&lt;font color="#ff0000"&gt;?&amp;lt;val&amp;gt;&lt;/font&gt;xyz)\b&lt;o:p&gt; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;And your replacement string to “$&lt;font color="#ff0000"&gt;{att}&lt;/font&gt;(new data goes here)$&lt;font color="#ff0000"&gt;{val}&lt;/font&gt;)”&lt;o:p&gt; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;Now if you are using the string format method there is one
more hoop you have to jump through because the&lt;span&gt;&amp;nbsp;
&lt;/span&gt;regex engine and format method both use the curly braces there is a
conflict and the format method will complain so you have to write it like this&lt;o:p&gt; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;String.Format("${0}att{1}{2}${0}val{1}","&lt;font color="#008000"&gt;{&lt;/font&gt;","&lt;font color="#ff0000"&gt;}&lt;/font&gt;",newValue)&lt;o:p&gt; &lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;




&lt;p class="MsoNormal"&gt;To get the desired replacement string.&lt;br&gt;
&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;When I started writing this I thought this was the only way
to get this to work which means you could only solve this with a regex engine, like .Net,
that supported named groups, but I’ve thought of a second way.&lt;br&gt;
&lt;/p&gt;
&lt;p class="MsoNormal"&gt;But whenever a group in your replacement string
can be followed by a digit you may want to consider using named groups
to avoid unexpected surprizes&lt;br&gt;
&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;br&gt;
&lt;/o:p&gt;&lt;/p&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;subject=Named+Groups+to+the+Rescue" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;title=Named+Groups+to+the+Rescue" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;title=Named+Groups+to+the+Rescue" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;title=Named+Groups+to+the+Rescue" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx&amp;amp;;title=Named+Groups+to+the+Rescue&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/09/28/12925.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=12925" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Making your regex code ready.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx</id><published>2005-05-18T16:44:00Z</published><updated>2005-05-18T16:44:00Z</updated><content type="html">&lt;p class="MsoNormal"&gt;There are times when regular expression you’ve written or
someone written for you needs a little tweaking before you add it to your code
and the tweaking is required because the syntax of the language conflicts with
your regex.&lt;span style=""&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;For example when part of
your regex pattern contains a double quote and the language you are using uses
double quotes as string delimiters.&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;If
you just cut and paste the pattern in your code the pattern’s quotation will
terminate your string prematurely.&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;Now
the code way to fix it is to escape the quotation in pattern. This solution
requires altering the regex and how the character is escapes depends on the
language being used.&lt;span style=""&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;The regex itself
allows you to escape character with the \ character. &lt;span style=""&gt;&amp;nbsp;&lt;/span&gt;The language being used may or may not
recognize that as escape character for its syntax.&lt;span style=""&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;And it may be confusing later when you look
at the regex and can’t remember why you escaped a character that the pattern
itself doesn’t need it, But there is another way. Hex values&lt;/p&gt;


&lt;p class="MsoNormal" style=""&gt;&lt;span style=""&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
&lt;/span&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Most regex implementations support a hex syntax \x##,&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;where # is a hex digit. &lt;/p&gt;


&lt;p class="MsoNormal"&gt;So if you use \x22 instead of double quote and \x27 for
single quotes the regexes become more cookie cutter ready.&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Another useful hex value is \x20 which is a space.&lt;span style=""&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;This is especially useful in .Net where
there is an option on a regex to ignorewhitespace in the pattern.&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;Turning this option on allows end of line comments
in the regex but with the exception of inside a character class, ignores typed
in spaces within the patterns, which would be problematic if a space was part
of the pattern to match. &lt;span style=""&gt;&amp;nbsp;&lt;/span&gt;So you could
break a working regex if you later decide to add this option. This happened on
the Regexlib when the option was first turned on.&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;A lot of patterns that were written before
the switch was flipped suddenly stopped working.&lt;/p&gt;


&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;


&lt;p class="MsoNormal"&gt;Speaking of .Net when it comes to name groups you can’t use
the hex notation to define the group name using the single quote syntax .&lt;span style=""&gt;&amp;nbsp; &lt;/span&gt;However you can avoid any issue with single
quotes by using the alternate syntax.&lt;/p&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;subject=Making+your+regex+code+ready." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;title=Making+your+regex+code+ready." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;title=Making+your+regex+code+ready." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;title=Making+your+regex+code+ready." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx&amp;amp;;title=Making+your+regex+code+ready.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/05/18/934.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=934" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry><entry><title>Word Break.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx" /><id>http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx</id><published>2005-02-09T18:17:00Z</published><updated>2005-02-09T18:17:00Z</updated><content type="html">
The list-detail design is very commonly used with web pages, where you have a list of links that lead to more detailed information of each entry.  Sometimes the text of the list is simply a snippet of a much longer string of text in the detail.  A common way to handle this is to use a string function to return the first n characters of the string and display that in the list.  The problem with this is that tends to make the break right in the middle of a word. Which isn’t major problem but can be aesthetically displeasing or may accidentally form another word you didn’t mean to put on your site. 

When facing this issue I came up with a simple regex to allow me to break on whole words.

^(?:[ -~]{n,m}(?:$|(?:[\w!?.])\s))


Where n = the minimum number of characters to match
And m = the maximum number of character to allow in the match. 

Now in instance I’m considering a word to be one or more ACSII non white-space characters.  The way it works is after matching n ASCII characters it tries to match either the end of the string or a letter or sentence ending punctuation followed by a white space.  So it will accept as many characters, including white spaces as it can up to m and still satisfy the rest of the match.  Otherwise it backtracks until the regex is satisfied. So if you wanted a minimum of 2 characters and a maximum of 75 the regex would be

^(?:[ -~]{2,75}(?:$|(?:[\w!?.])\s))

and if you applied it the Gettysburg Address

“Four score and seven years ago, our fathers brought forth upon this continent a new nation: conceived in liberty, and dedicated to the proposition that all men are created equal.
…” (only the 1st paragraph shown for the example but you could apply the full text)

Taking the first match you get

“Four score and seven years ago, our fathers brought forth upon this ”


There are a few problems with the regex that can be improved.  First off it only accepts basic ASCII displayable characters, decimal 32 to 126 with mean the text must be in that range.  I did it this way because it give you the US alphabet, digits and commonly used symbols and punctuation which was all I needed at the time. Other characters would need to be added.  Also if the first word character count exceeds your maximum length no match will be found


 You can make this regex a little dynamic by putting inside a function that takes the your string, the max and min values as input.

&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;subject=Word+Break." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;title=Word+Break." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;title=Word+Break." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;title=Word+Break." target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx&amp;amp;;title=Word+Break.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=324" width="1" height="1"&gt;</content><author><name>mash</name><uri>http://regexadvice.com/members/mash.aspx</uri></author></entry></feed>