<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://regexadvice.com/utility/FeedStylesheets/atom.xsl" media="screen"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><title type="html">Justin's Regex Blog</title><subtitle type="html"> Thinking in Regex</subtitle><id>http://regexadvice.com/blogs/justin_rogers/atom.aspx</id><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/default.aspx" /><link rel="self" type="application/atom+xml" href="http://regexadvice.com/blogs/justin_rogers/atom.aspx" /><generator uri="http://communityserver.org" version="2.1.60809.935">Community Server</generator><updated>2004-05-21T23:55:00Z</updated><entry><title>Examining the data flow model of using DateTime as a Parser for Dates</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx</id><published>2004-08-15T01:55:00Z</published><updated>2004-08-15T01:55:00Z</updated><content type="html">&lt;P&gt;Yet another diagram here. This time we get to see some differences between a parser/validator model where the pieces are separated for purposes of providing enhanced user feedback and more specialized return structures. The end result is that we are able to provide many more services with about the same amount of work. This is the best all-around option because it means a relatively small amount of code, requires the least amount of end-user knowledge (even less so than the regular expressions option), and the abstract mapping is appropriate for nearly any programming language.&lt;/P&gt;
&lt;P&gt;&lt;IMG src="http://www.games4dotnet.com/images/Blogging/JustinRogers/DateTimeParserDataFlowModelForDates.JPG" P &lt;&gt;
&lt;P&gt;&lt;STRONG&gt;User Input:&lt;BR&gt;&lt;/STRONG&gt;You start to see the first benefits to a parser/validator model in that the input can now take multiple formats. In general parsing a string into a date not only involves make sure the string is in the appropriate format, but also checking the numeric values of portions of the string. The string parsing model still exists through the use of the &lt;U&gt;Parse&lt;/U&gt; or &lt;U&gt;ParseExact&lt;/U&gt; method, but there are other options. You can specify already parsed integers when creating a DateTime or optionally pre-split string portions. You'd never enable these features through expressions because you'd wind up coding multiple expressions, processing the strings into integers earlier in the data flow model, and wrting a bunch of code to select which expressions to use for validation. Not that you couldn't enable these features, just that it doesn't buy you anything.
&lt;P&gt;&lt;STRONG&gt;Parse:&lt;BR&gt;&lt;/STRONG&gt;I put some range checking notes under parsing. This happens because of basic validation of month, day, and year ranges that occurs during the parse or construction phases of a DateTime. If you were writing a light DateTime parser what you might do is leave the range checking in, but rather than throw exceptions provide more feedback. The nice thing about the DateTime integrated range validation is that as the values are parsed into strongly typed integers and a basic range check is done rather than working at the character comparison level.
&lt;P&gt;The parsing of a DateTime is much faster than the character level parsing of a regular expression even though more work is being done towards the end result in terms of getting the data from the least usable format (a string of several values) to the most usable format (a numeric type). In fact even with conversion in place the parsing is much more functional (the possibility of more format strings) and robust.
&lt;P&gt;&lt;STRONG&gt;Result:&lt;BR&gt;&lt;/STRONG&gt;Unfortunately the result of a parse or constructed DateTime has onl two possibilities. Either you get a somewhat valid date returned (still needs additional validation) or you get an exception. If you get the exception it is just like a false return value from the Expression processor. To avoid the exception a TryParse in Whidbey will return a boolean true/false depending on parsing success or failure. You still get the same strongly typed parsed results, just without the exception in the failure case.
&lt;P&gt;The result here is massively different from an Expression result. The reason is that each field is now in a numeric format and has been preconverted. It is ready for work to be done, since undoubtedly you'll want to do some work. It is also in the most optimal storage format in terms of size. If you get to this point there were no parsing errors, but there may still be other invalid ranges within the dates.
&lt;P&gt;&lt;STRONG&gt;Validate:&lt;BR&gt;&lt;/STRONG&gt;Some of the validation has already occured in the parsing phase, but the extra validation comes in now. By using the strongly typed results you can quickly perform comparisons on the fields to remove days that never existed within a particular calendar. In fact it becomes possible to chain various validators on the end of the process depending on the level of checking that needs to apply. We have several options at this juncture. We can enforce historical accuracy by removing various days that are valid at the parser level but shouldn't be allowed at a historical accuracy level. We can also enforce programmatic ranges such as only allowing dates after 2000.
&lt;P&gt;Each validator in the chain gets a chance to flag an error. This can provide the user with great granularity in user feedback.
&lt;P&gt;&lt;STRONG&gt;User Feedback:&lt;BR&gt;&lt;/STRONG&gt;Tell the user where the process failed. There is a lot of work happening on the date and a number of different things that can go wrong. Arguably if you are only asking for a basic date or you expect your users to be smart maybe this doesn't matter to you. However, if you are enforcing odd constraints, such as historical accuracy then user feedback becomes much more important. &lt;/P&gt;
&lt;P&gt;Even more important than giving the feedback to the user is having the ability to give the user the feedback. You might use a once over expression to give you a true/false, but that result doesn't allow you to determine enough information to figure out why. In fact you'd have to reparse the string again just to figure out why. Documentation doesn't fix this problem because if I give you a list of ten things that go wrong, but I only give you enough information to tell you that ONE of them went wrong, you still have to figure out which of the ten happened. Originally the .NET Terrarium just told you if your assembly was valid or invalid, but it didn't tell you why. We added reporting because there are about 15 different things that can happen. This allowed us as programmers to test the validation logic. It also allowed end users to fix their own creature code by telling them exactly what was wrong with their code.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Conclusion:&lt;BR&gt;&lt;/STRONG&gt;We are almost to the point of having a truly functional date-time parser. At this level there is still a black-box at the parser level. We can't get an internal peak at which values are missing at the parser level nor do we get a look at the range checking. It would be nice to extend these values to the user. After all, we are doing the range checking already, we are doing the work in order to fail the parsing, there isn't any reason once the parsing has already failed not to give an indication of the condition of failure. A light date-time parser is next. This is a true parser and it gives actual feedback at all steps in the process. From an end-user perspective it is the most functional. It is also most functional as a reusable component for developers because it provides them the fastest parsing with more feedback than any of the other methods. The trade-off is programming complexity, but I can assure you that basic parsing routines require only the most basic programming constructs. There won't be anything strange in this process whatsoever.&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx&amp;amp;;subject=Examining+the+data+flow+model+of+using+DateTime+as+a+Parser+for+Dates" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx&amp;amp;;title=Examining+the+data+flow+model+of+using+DateTime+as+a+Parser+for+Dates" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx&amp;amp;title=Examining+the+data+flow+model+of+using+DateTime+as+a+Parser+for+Dates" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx&amp;amp;;title=Examining+the+data+flow+model+of+using+DateTime+as+a+Parser+for+Dates" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx&amp;amp;;title=Examining+the+data+flow+model+of+using+DateTime+as+a+Parser+for+Dates&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/340.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=340" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /><category term="Lexing and Parsing" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Lexing+and+Parsing/default.aspx" /></entry><entry><title>Examining the data flow model of using Regular Expressions for Dates</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx</id><published>2004-08-14T20:18:00Z</published><updated>2004-08-14T20:18:00Z</updated><content type="html">&lt;P&gt;I think some diagrams are going to be extremely important for examining the problems of using regular expressions for validating and parsing dates. Especially as you start to rely on more and more complex and time consuming operations to validate additional constraints. The following diagram is what I call the Expression Data Flow Model for Dates. It isn't complex, it just shows the abstract processing from user input to result all the way through processing. I'll use the various blocks to talk about each step in turn. I've also provided a sample run of a sample input and expression along with the result and processing the output.&lt;/P&gt;
&lt;P&gt;&lt;IMG src="http://www.games4dotnet.com/images/Blogging/JustinRogers/ExpressionDataFlowModelForDates.jpg"&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User Input:&lt;BR&gt;&lt;/STRONG&gt;This is where the user is sending data through your data flow pipeline. By using a regular expression you limit yourself to processing string input. This isn't all that bad since most date and time input on the web is done through a textual or string based field. More recently models have changed to supplying the user with a calendar input. Under these scenarios you can toss the expression out altogether and process the separated UI components of the calendar control itself.&lt;/P&gt;
&lt;P&gt;You'll see how separating the paser/validator logic later will enable us to short-circuit several of the steps involved in the data flow model.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Expression Block:&lt;BR&gt;&lt;/STRONG&gt;The expression is a parser, as it will be splitting portions of the input string and placing them into groups (aka variables or fields). It is also a validator because it is verifying that specific characters are digits and possibly digits within specific ranges. It also verifies digit counts, separator characters, etc... If you want to add additional validation logic it has to be placed into the expression itself. Without ingeniously designing the expression the result or output is going to be very fixed with respect to the amount of information you get back.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Result:&lt;BR&gt;&lt;/STRONG&gt;Once the expression has run you get a single boolean result value of true/false to determine if the pattern was successfully parsed AND validated. You don't know what part of the process failed, only that it failed. There are no partial groups filled so you can't verify that it failed at a specific location in the expression (without ingeniously engineering the expression).&lt;/P&gt;
&lt;P&gt;If the pattern does succeed then you have a number of groups or captures that identify the results. Notice that the entire match is stored in group 0 while your capture groups are stored in 1 through 3. The problem that you'll face is that you've ALREADY validated the digits using the expression, you basically&amp;nbsp;performed more than half the work of converting the digits into an actual strongly typed piece of data, but it is STILL returned to you as a string. If you want to continue moving forward and working with the fields themselves you have to do some processing.&lt;/P&gt;
&lt;P&gt;The point I have been trying to make with user feedback about expressions comes into play here. You'll notice that we can only tell the user (because the parser and validator is a black box) that the pattern was successful or not. We have no clue why the input failed. We can't identify that it failed for a critical reason like an invalid pattern (failed parsing) or whether it failed for a data and logic reason (a date that doesn't exist in history). Imagine for instance you started at site that published the history of the world and the user is inputting dates to try and find data. On your site, some dates just won't go in and the user can't figure out why. They go to google and search the same date and magically results appear. For dates that don't exist there are sure a lot of pages with information that happened on those dates. The reason? Well, not everyone in history gave a hoot that the calendar was changing and much of written history is overlapped with stuff that happened on dates that technically didn't exist.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Processing:&lt;BR&gt;&lt;/STRONG&gt;At the end of the data flow model you can't just keep a date around in the original string based form. A simple date with a 4 digit year is stored using 10 characters or 20 bytes, 10 if you are on a non-Unicode system. However, you really don't need that much information for a date. Without an associated time it will easily encode into a single integer. If you start parsing times then you'll need to use up to 8 bytes. The nice thing about the 8 byte version is that it is fully parsed and encoded. The information is retrieved using a series of quick mathematical transforms rather than having to reparse the string to get information.&lt;/P&gt;
&lt;P&gt;If you plan on using the date then you need it in some form where you can work on the members. After all, you can't add easily add days, months, years, or do any sort of transformation on the results of the string in expression form. Some users may point out that JScript allows you to work with the groups as if the underlying capture were actually an integer but this is simply foolishness in not understanding that every time you do that a conversion is being made on your behalf. That means once you've already validated the data once you have to validate it again using conversions since a conversion is nothing more than a validation of the underlying string to see if it has an associated strong typed integer representation. Just doesn't seem to make much sense does it?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Conclusion:&lt;BR&gt;&lt;/STRONG&gt;The data flow model for parsing dates with expressions is bleak because you wind up spending a large amount of time parsing and validating inside of the expression itself only to do a large amount of additional processing after the fact. Michael has pointed out that existing methods for parsing often don't give much feedback&amp;nbsp;either. With that in mind, we'll examine the data flow model and the additional feedback provided by using an existing parser with a supplementary validator. You'll see how converting data as you parse it prevents extra processing later and how deconstructing the black-box of the parser/validator that is a regular expression will enable you to start providing more feedback.&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx&amp;amp;;subject=Examining+the+data+flow+model+of+using+Regular+Expressions+for+Dates" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx&amp;amp;;title=Examining+the+data+flow+model+of+using+Regular+Expressions+for+Dates" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx&amp;amp;title=Examining+the+data+flow+model+of+using+Regular+Expressions+for+Dates" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx&amp;amp;;title=Examining+the+data+flow+model+of+using+Regular+Expressions+for+Dates" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx&amp;amp;;title=Examining+the+data+flow+model+of+using+Regular+Expressions+for+Dates&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/339.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=339" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /></entry><entry><title>Performance: Character Classes versus Alternation Groups</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx</id><published>2004-08-14T07:53:00Z</published><updated>2004-08-14T07:53:00Z</updated><content type="html">&lt;P&gt;If heated discussion with Michael Ash keeps resulting in this sort of performance break-through then I'll argue with him every day of the week. To follow the performance examinations you are going to have to make sure that both Compiled and ExplicitCapture are on. While it might be nice to also check the interpreted speed, I can imagine that when running in interpreted mode, most of the speed enhancements of the alternation group are completely lost.&lt;/P&gt;
&lt;P&gt;Let's start the process by examining the process of matching the inputs of 10 and 12. First we'll do what I call finger matching using a character class. The expression we'll use is 1[02]. Finger matching looks something like this, though having some images would be better, I don't have the time to make them.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;Op #: Description&lt;BR&gt;Op 1: Input[0] = Pattern[0]; // Input = &amp;#8220;11&amp;#8221; for now&lt;BR&gt;Op 2: Fail or Scan; // Fail on no equal, Scan next char on equals&lt;BR&gt;Op 3: Setup Character Class Match; // Here we do math to set up a binary search&lt;BR&gt;Op 4: Input[1] = CharClass[1]; // Because we have two elements 0 and 1, we'll try to match first 1 then 0&lt;BR&gt;Op 5: Succeed or Input[1] = CharClass[2];&lt;BR&gt;Op 6: Succeed or Fail;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;Now, don't let the number of operations confuse you. Just because the character class matching looks really fast, it involves setting up a call stack, calling a method, scanning through a string representing the character class itself, and finally returning some true/false value. That can be costly, especially compared to branch and comparison statements. That puts us at an alternation group that does the same thing. It looks something like this 1(0|2). Remember that explicit capture is on. If not, you'll kill your perf. If you don't want explicit capture then at least prevent capture using 1(?:0|2):&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;Op 1: Input[0] = Pattern[0];&lt;BR&gt;Op 2: Fail or Scan&lt;BR&gt;Op 3: Input[1] = Alt[0][0]; // First alternation group, first check&lt;BR&gt;Op 4: Succeed or Input[1] = Alt[0][1]; // We reach the comparison or success through a branch statement. Fast!&lt;BR&gt;Op 5: Succeed or Fail&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;There is only one less conceptual operation, but several hundred less machine operations. We find that in the compiled scenarios running a worst case scenario match (aka a match that won't be found in the alternation group or character class) will examine the performance potentials. Here is the code for running against the digit class and the lower case character class (note that going against the lower case word character class is even slower because of Unicode support, so we use an explicit character class).&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Regex regex23 = new Regex("^1\\d$", RegexOptions.Compiled | RegexOptions.ExplicitCapture);&lt;BR&gt;Regex regex24 = new Regex("^1(0|1|2|3|4|5|6|7|8|9)$", RegexOptions.Compiled | RegexOptions.ExplicitCapture);&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Regex regex25 = new Regex("^a[a-z]$", RegexOptions.Compiled | RegexOptions.ExplicitCapture);&lt;BR&gt;Regex regex26 = new Regex("^a(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)$", RegexOptions.Compiled | RegexOptions.ExplicitCapture);&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;If you are going to run these under a performance scenario then the use string &amp;#8220;1a&amp;#8221; for the first pattern. This will match the start of the pattern, and then fail matching the remaining required digit. For the alternation group this will mean 10 branch comparisons. For the character class it means 3 comparisons and the setup of the call-stack. For the other pattern, use &amp;#8220;a1&amp;#8221; and the same thing will happen. Run them once outside of the loop to get the grease rolling. This makes sure all of the methods are JIT'ed. This is important. Lots of people don't prime the pump, but you need to make sure and call every method at least once and that means possibly exercising hundreds of code-paths in complex scenarios to make sure the perf results don't include JIT time.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;// Compile the Shiznut&lt;BR&gt;regex23.IsMatch(Data11);&lt;BR&gt;regex24.IsMatch(Data11);&lt;BR&gt;regex25.IsMatch(Data12);&lt;BR&gt;regex26.IsMatch(Data12);&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;start = DateTime.Now;&lt;BR&gt;for(int i = 0; i &amp;lt; 10000000; i++) {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; regex23.IsMatch(Data11);&lt;BR&gt;}&lt;BR&gt;end = DateTime.Now;&lt;BR&gt;Console.WriteLine("Character Class [0-9] Elems {0}", end - start);&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;start = DateTime.Now;&lt;BR&gt;for(int i = 0; i &amp;lt; 10000000; i++) {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; regex24.IsMatch(Data11);&lt;BR&gt;}&lt;BR&gt;end = DateTime.Now;&lt;BR&gt;Console.WriteLine("Alternation Group (0|1|2|3|4|5|6|7|8|9) {0}", end - start);&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;start = DateTime.Now;&lt;BR&gt;for(int i = 0; i &amp;lt; 10000000; i++) {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; regex25.IsMatch(Data12);&lt;BR&gt;}&lt;BR&gt;end = DateTime.Now;&lt;BR&gt;Console.WriteLine("Character Class [a-z] Elems {0}", end - start);&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;start = DateTime.Now;&lt;BR&gt;for(int i = 0; i &amp;lt; 10000000; i++) {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; regex26.IsMatch(Data12);&lt;BR&gt;}&lt;BR&gt;end = DateTime.Now;&lt;BR&gt;Console.WriteLine("Alternation Group (a|b|...|y|z) {0}", end - start);&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Just a basic set of loops. We are using IsMatch here rather than Match. Match creates some additional objects on the stack and we want to minimize the amount of memory thrash. The results are interesting to say the least. I am displaying the closest set of runs that I was able to capture giving the character class the benefit of the doubt. In my testings the character classes were very iffy in their results and timings fluctuate wildly. For the alternation group the timings are much more stable. I imagine the extra stack thrashing for character classes might have something to do with this timing difference.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Character Class [0-9] Elems 00:00:02.5636864&lt;BR&gt;Alternation Group (0|1|2|3|4|5|6|7|8|9) 00:00:02.3533840&lt;BR&gt;Character Class [a-z] Elems 00:00:02.4334992&lt;BR&gt;Alternation Group (a|b|...|y|z) 00:00:02.3834272&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Now with this knowledge, that alternation groups are somewhat faster, you can do some more work to get even better results. For instance, you can order the precedence of characters in the alternation group to more closely resemble the density of characters that you are matching against. For the english language this might mean matching vowels in the beginning. You'll get huge perf wins whenever the alternation group has a character matching a character in the input, but still nothing when the character is outside of the range.&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;There is one note. A huge peformance gain exists that isn't being taken advantage of. A linear character class such as [a-z] can be turned into two boundary checks. This would certainly save a lot of time. A complex class such as [0-9a-zA-Z] could be turned into 3 such boundary checks. Maybe Kit will grace us with a view and put this on a possible future performance improvement list. While this won't work for complex character classes especially those that are unicode aware, it does work for user specified classes.&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx&amp;amp;;subject=Performance%3a+Character+Classes+versus+Alternation+Groups" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx&amp;amp;;title=Performance%3a+Character+Classes+versus+Alternation+Groups" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx&amp;amp;title=Performance%3a+Character+Classes+versus+Alternation+Groups" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx&amp;amp;;title=Performance%3a+Character+Classes+versus+Alternation+Groups" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx&amp;amp;;title=Performance%3a+Character+Classes+versus+Alternation+Groups&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/14/338.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=338" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /></entry><entry><title>Expressions, Parsers, Validators, User Feedback and the tradeoffs we make.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx</id><published>2004-08-11T01:04:00Z</published><updated>2004-08-11T01:04:00Z</updated><content type="html">&lt;P&gt;While the traffic has a local nature to regexadvice.com/blogs, there has been a lot of examination of various expressions where a date processing expression has become the topic of note. Personally, I have huge problems with constantly having a date expression tossed out as an example. Primarily the expression is monolithic and already starting to breach, in terms of complexity, the barrier of usefulness. Second, because of its huge nature, and the large amount of time the original author took to create the expression, working with it to prove or disprove discussion is tedious and time consuming, almost not worth it. I'll take a short moment to lay in cement some ideas I have about the usefulness of regular expressions as validators.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Expressions:&lt;/STRONG&gt;&lt;BR&gt;Let's start with expressions. Expressions, while they appear light on the surface, are nothing more than generic parsing routines. Each element you place in the expression string is translated into string tokenizing and comparison routines. This is no different from writing in a computer language to give the computer instructions as to what to do. Expressions are a language in and of themselves and the complexity of the underlying implementation simply goes unnoticed. If you've checked any of Darren's or my posts on lexing and parsing you've gotten a sneak peak at what goes on behind the scenes in a regular expression engine.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Parsers:&lt;BR&gt;&lt;/STRONG&gt;Parsers are hard-coded routines for examining underlying data. If we take the date-time expression we can convert it into only a few lines of parsing code, definitely less than 30 even if we are very verbose. In fact the number of characters (not lines) in the final parser program is actually shorter than the date expression Michael Ash has posted. His does some extra work though, namely validation, so we'll get to that in a second. Underlying every expression is a highly optimized generic parser, but no generic parser can run at the same speed as an optimized specialized parser. It just can't happen. You can compile expressions under .NET and in some cases the resulting code is more specialized and runs much faster, but it still uses the underlying syntax and semantics of the expression engine which may still create some slow-downs.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Validators:&lt;BR&gt;&lt;/STRONG&gt;After a parser is complete the data it retrieves needs to be validated in many cases. The date routines have a number of conditionals that validate certain date regions that never actually existed as calendars were changed to support new knowledge of the movement of the Earth around the Sun. These validations run as assertions, and as soon as an assertion is met the expression is failed and the user is given a boolean feedback as to whether or not the date is valid.&lt;/P&gt;
&lt;P&gt;Now, how many of you know the list of invalid date combinations between 1AC and 9999AC? I can definitely say that I don't. Validators are in place so that once the specialized parser is complete the values retrieved can be checked against lists or functions of valid or invalid values. The result is to provide feedback to the user in some form or another.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User Feedback:&lt;BR&gt;&lt;/STRONG&gt;The end result of parsing and validation is generally user feedback. The reason I can't stand behind monolithic boolean result expressions is that they don't tell the user what was wrong with the input. If there is a date range in the 1500's that is invalid because of a calendar change then the user needs to be made aware. By all modern calendar rules the date would be valid, but the dates disappeared from history and the user needs to know this else they'll find something wrong with your parsing/validation logic.&lt;/P&gt;
&lt;P&gt;This brings up two types of validation. The first type of validation is the validation you perform on user input. The next is the validation the user performs when working with your software. If you don't give a reason why the user is wrong, the user will think that you are wrong. They'll think that your stuff is broken. Further, if you don't give real feedback about why the input is invalid the user can't double check you. After all, if you tell them why they are wrong, but they verify the input is actually valid, then they can notify you of errors in your own program. As Michael has pointed out, he has had many issues with various regular expression engines handling different clauses in different ways. At the end of the day I can't honestly say I'd feel safe using the expressions because of their complexity, their lack of feedback to the user as to what EXACTLY is wrong with the input date, their required support of an underlying regular expression engine, their reliance on complex features, and the fact that a specialized parser does the job faster, more accurately, and with more feedback potential.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Trade-Offs:&lt;BR&gt;&lt;/STRONG&gt;What are the trade-offs between parsers and expressions? Well, supposedly expressions are easy to use&amp;nbsp;(not generate, but use), are compact, and are portable between multiple expression systems. However, this isn't necessarily the case. The one thing that expressions do for the user is generate parsers and validators of data without understanding the underlying technology of parsing and validating data. It is a tool-set and an abstraction.&lt;/P&gt;
&lt;P&gt;In the end, the trade-offs are going to be:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Speed&lt;/STRONG&gt; - The speed of running the parsing and validation logic are going to be much faster when written in shear code.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Portability&lt;/STRONG&gt; - Basic parsing code is extremely portable and can be quickly translated between languages. Expression engines are expected to live up to a spec, but general and widespread implementations often differ in small details.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Complexity of Validation&lt;/STRONG&gt; - Basic validation in an expression can often require specialized constructs. You often sacrifice the level of validation you can perform.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Feedback&lt;/STRONG&gt; - This is the most important feature of any validation. You have to give the user proper feedback so they can either fix the string or find errors in your validation logic. Feedback is almost always platform specific and doesn't integrate well with complex validation rule-sets inside of a single regular expression.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Quote of the Post - &lt;EM&gt;Use the appropriate tool for the job by identifying solutions that solve all end user and programmer related problems.&lt;/EM&gt;&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx&amp;amp;;subject=Expressions%2c+Parsers%2c+Validators%2c+User+Feedback+and+the+tradeoffs+we+make." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx&amp;amp;;title=Expressions%2c+Parsers%2c+Validators%2c+User+Feedback+and+the+tradeoffs+we+make." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx&amp;amp;title=Expressions%2c+Parsers%2c+Validators%2c+User+Feedback+and+the+tradeoffs+we+make." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx&amp;amp;;title=Expressions%2c+Parsers%2c+Validators%2c+User+Feedback+and+the+tradeoffs+we+make." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx&amp;amp;;title=Expressions%2c+Parsers%2c+Validators%2c+User+Feedback+and+the+tradeoffs+we+make.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/337.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=337" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /><category term="Lexing and Parsing" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Lexing+and+Parsing/default.aspx" /></entry><entry><title>What does conditional matching really mean in a regular expression?</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx</id><published>2004-08-10T06:02:00Z</published><updated>2004-08-10T06:02:00Z</updated><content type="html">&lt;P&gt;Darren recently drew an example, but even though I'm Regex adept, I didn't find his meaning &lt;A id=_74af7b23657_HomePageDays_DaysList__ctl1_DayItem_DayList__ctl1_TitleUrl href="/dneimke/archive/2004/08/09/1467.aspx"&gt;&lt;FONT color=#223355&gt;Conditional Matching&lt;/FONT&gt;&lt;/A&gt;. He basically shows a way to match Thur in Thursday or Thurday, but then if there is an s, make sure that day also follows. Now I would assert that you can tell what a regex is supposed to do by listing all of the things it matches. Turns out that his expression will either match Thursday, Thurs, or Thur. I don't know what his intention was, but I can talk a bit about what the conditional syntax buys you...&lt;/P&gt;
&lt;P&gt;Let's start by taking a look at what might be a possible single level nesting block capture expression:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;^\s*({)?(?(1).+?}|.+?$)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;This is designed to match two types of statements. Single-line statements that match all of the characters on a single line, or nested blocks that exists between matching french braces. It isn't built for multiple levels of nesting, but instead just a basic pattern.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;Foo&lt;BR&gt;Bar&lt;BR&gt;Baz&lt;BR&gt;{&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Foo&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Bar&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Baz&lt;BR&gt;}&lt;BR&gt;Foo&lt;BR&gt;Bar&lt;BR&gt;{&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Foo&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Bar&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Baz&lt;BR&gt;}&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Now this appears to be an excellent reason to use conditional matching. First, we look for an opening brace. If we don't find one then we process one line of input using the &lt;STRONG&gt;else&lt;/STRONG&gt; clause. If we do find one then we process the &lt;STRONG&gt;if&lt;/STRONG&gt; clause. How would this look as a normal expression though? Can we gain any insight by looking at an expression that does the same thing without the use of the conditional? I sure hope so because it speaks volumes about what a conditional clause really buys you.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;^\s*([^{].+?$|{.+?})&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Above is one version of a possible expression. We specify an &lt;STRONG&gt;else&lt;/STRONG&gt; clause by guaranteeing our condition won't be true. We'll match any character except for an open brace and then begin to match a single line statement. As for the &lt;STRONG&gt;if&lt;/STRONG&gt; clause, it is simply rewritten as the second part of the alternation group. If the first character doesn't match the negative character class containing only the open brace, then the open brace must be there and we match it. This just shows off a basic if then. Basic if thens can always be turned into a two element alternation group.&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;You start to see more where conditionals are useful when the space between the match group (condition) and the conditional expression grows either large or complex. Anything that exists between the conditional has to be repeated within the expression. We'll call this the BEGIN..END expression.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;^(BEGIN )?[a-zA-Z0-9][_a-zA-Z0-9]*\\([^\\)]+\\)(?(1) END)\\s*$&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;The BEGIN..END is optional in the expression, but as soon as you add one, you have to also add the other. The END is required whenever the BEGIN is seen. How would you build this requirement in using alternation groups?&amp;nbsp;Well, you have to repeat the middle pattern. The bigger the pattern gets between the condition and the conditional the more pattern you need to repeat. In the above sample this can mean quite a bit.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;^(BEGIN [a-zA-Z0-9][_a-zA-Z0-9]*\\([^\\)]+\\) END|[a-zA-Z0-9][_a-zA-Z0-9]*\\([^\\)]+\\))\\s*$&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;That makes conditionals pretty darn nice in my book since they save you a bunch of typing. Using the same conditional multiple times still reduces into a single alternation group. So a bunch of extra work isn't apparent in this scenario. Adding more than a single conditional has some very negative impacts on the final expression. You can either increase the number of alternations exponentially or linearly depending on how the conditionals are organized. Maybe I'll get into that complexity another time. A quick summary follows:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN-RIGHT: 0px"&gt;Conditionals allow you to select between up to two patterns based on the match success of previous patterns.&lt;/DIV&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN-RIGHT: 0px"&gt;Conditionals can be dependent on either a previous numbered or named match group or on a pattern that is specified as the conditional.&lt;/DIV&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN-RIGHT: 0px"&gt;A single conditional can be rewritten as an alternation group with two patterns.&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN-RIGHT: 0px"&gt;The true or left pattern is normally a combination of the optional match group, the intermediate pattern, and the &lt;STRONG&gt;if&lt;/STRONG&gt; clause&lt;/DIV&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN-RIGHT: 0px"&gt;The false or right pattern is normally a combination of the intermediate pattern and the &lt;STRONG&gt;else&lt;/STRONG&gt; clause&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN-RIGHT: 0px"&gt;A single conditional used multiple times in a pattern can still be rewritten as an alternation group with two patterns&lt;/DIV&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN-RIGHT: 0px"&gt;Multiple conditionals in the same pattern can increase the number of alternations linearly or exponentially or a combination of both&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN-RIGHT: 0px"&gt;A conditional with a second conditional in the &lt;STRONG&gt;if&lt;/STRONG&gt; clause increases linearly from 2 to 3 alternations&lt;/DIV&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN-RIGHT: 0px"&gt;A pattern with two separate conditionals increases exponentially from 2 to 4 alternations (think binary truth tables)&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/BLOCKQUOTE&gt;
&lt;P style="MARGIN-RIGHT: 0px"&gt;&amp;nbsp;&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx&amp;amp;;subject=What+does+conditional+matching+really+mean+in+a+regular+expression%3f" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx&amp;amp;;title=What+does+conditional+matching+really+mean+in+a+regular+expression%3f" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx&amp;amp;title=What+does+conditional+matching+really+mean+in+a+regular+expression%3f" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx&amp;amp;;title=What+does+conditional+matching+really+mean+in+a+regular+expression%3f" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx&amp;amp;;title=What+does+conditional+matching+really+mean+in+a+regular+expression%3f&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/10/336.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=336" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /></entry><entry><title>The not so basics of NOT!</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx</id><published>2004-08-06T05:50:00Z</published><updated>2004-08-06T05:50:00Z</updated><content type="html">&lt;P&gt;It is very easy to build a regular expression that matches a particular pattern, but not one that matches the absence of a pattern. Thankfully we've been given some tools&amp;nbsp;in the form of assertions that make the ability to perform a NOT operation quite easy. Let's start by looking at the negative lookahead assertion using a pattern that has been in hot discussion lately over on Darren's blog &lt;A id=viewpost.ascx_TitleUrl HREF="/dneimke/archive/2004/08/03/1432.aspx"&gt;&lt;FONT color=#223355&gt;The sugary synax we love... Lookaround &lt;/FONT&gt;&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;The pattern matches &amp;amp;, in order to escape it as &amp;amp;amp;, but it doesn't escape the &amp;amp; if it is already part of an entity escape.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;&amp;amp;(?!lt;|rt;|amp;)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;We used the negative look-ahead to match all ampersands that aren't followed by an entity. Not too hard at all. The NOT in this case means look for specific items and fail the match if they exist. Along with the negative look-ahead assertion there is an equivalent look-behind assertion. With it you can match all patterns that aren't in turn preceded by something.&lt;/P&gt;
&lt;P dir=ltr&gt;Now, let's talk about the other forms of NOT that we are given with regular expressions that aren't related to assertions. First off we can build a negative character class group. This allows us to match all but a certain set of characters. If we wanted to begin building a simple version of the assertion pattern, we might at least make sure that the next character following the &amp;amp; wasn't an l, r or a.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;&amp;amp;[^lra]&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;Now we are matching all &amp;amp; that are NOT followed by the l, r, or a... If we continue building the pattern out we can even use character classes to form complex NOT operations for entire strings and groups of strings.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;&amp;amp;($|[^lra]|a[^m]|am[^p]|amp[^;]|[lr][^t]|[lr]t[^;])&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;We've now constructed an identical pattern using the character class NOT operator to perform the same operation as the negative look-ahead assertion. If you look at the comments in the post I've referenced you'll also notice a third syntax I use for creating a NOT operator, and that is to optionally match something and then only if it matches try to match an impossible character.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;&amp;amp;(?(lt;|rt;|amp;)^)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;The above pattern tries to match a beginning of string character if it finds any of the patterns or literals we don't want to follow the &amp;amp; character. Since we shouldn't be able to match the start of string it will cause the expression to fail and the conditional will act like a NOT operator.&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Hopefully you've found some of this interesting, then again maybe not. Each tool in a regular expression can be viewed as a hint at how the pattern is going to operate. Some of the tools affect the string scanner (such as assertions that do a forward or backward scan based on the current stream offset) others control backtracking and greediness. By manipulating the tools and understanding how they work you can make the most performant expressions.&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx&amp;amp;;subject=The+not+so+basics+of+NOT!" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx&amp;amp;;title=The+not+so+basics+of+NOT!" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx&amp;amp;title=The+not+so+basics+of+NOT!" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx&amp;amp;;title=The+not+so+basics+of+NOT!" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx&amp;amp;;title=The+not+so+basics+of+NOT!&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/08/06/335.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=335" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /></entry><entry><title>Have you ever dreamed a problem?  Two nights in a row now, no sleep for me I guess...</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx</id><published>2004-05-26T12:56:00Z</published><updated>2004-05-26T12:56:00Z</updated><content type="html">&lt;P&gt;When I get really zoned in on a problem I can't sleep, or rather I do sleep, but it seems that somehow the topic in my head becomes the makings of some really interesting dream.&amp;nbsp; I've always had mild insomnia, so I'm capable of sleeping and dreaming right on the edge of the asleep/wake border.&amp;nbsp; I've often found myself with an actual dream continuing even as I wake up and become conscious enough to do stuff around my house.&amp;nbsp; Definitely not something I recommend to others.&lt;/P&gt;
&lt;P&gt;My point?&amp;nbsp; Well, I kept thinking about this problem of decomposing the ranges into expressions.&amp;nbsp; ALL NIGHT.&amp;nbsp; My brain kept telling me that some additional items would be nice to add.&amp;nbsp; Namely, the creation of range coverage print-outs for each of the generated range decompositions (if you note the ranges are doing things like covering 0-9, then 10 through 99, then 100 through 999, etc...&amp;nbsp; That is the way things get decomposed.&amp;nbsp; Unfortunately doing it this way does leave one single spot for error.&amp;nbsp; A number such as 001 will match \d?\d?\d.&amp;nbsp; If you wanted something that threw on leading zeros, you'd have to add more to the expressions.&lt;/P&gt;
&lt;P&gt;Since they break down this way, it becomes easy to take our expressions and turn them back into the ranges that they would match.&amp;nbsp; I'm not talking about the standard ranges of say 0 through 5000, but the actual saturation ranges of 0 through 9, 10 through 99, etc..., so you know that every inch of your range has actually been covered and that your algorithm isn't screwing up.&lt;/P&gt;
&lt;P&gt;The second thing I kept coming to was the concept that merging two ranges wouldn't be that hard, or merging N arbitrary ranges.&amp;nbsp; The concept would be finding any ranges that overlap, and then merging their contents to produce a new match group that would match the expanded range or set of ranges.&amp;nbsp; Then I thought, hell this is nothing more than a subset of set theory.&amp;nbsp; I think that was the point I was trying to make.&amp;nbsp; You see Darren has been wanting a way into set theory, and I think he found it.&amp;nbsp; Each of the expressions we are forming denotes a set, and the key is to find if all members in the set defined by the range are also present in at most one other set that represents are decomposed expression.&amp;nbsp; Note the keywords there.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;X is our target number X is in the set of { rangestart..rangeend }&lt;BR&gt;Y is our expression Y is the superset of all sets { decomposed1..decomposedn }&lt;BR&gt;All T in X must be a member of Y additionally all T in X must be a member of at most one decomposed and not more&lt;BR&gt;All T not in X must not be a member of Y&lt;/P&gt;
&lt;P&gt;Y is just a cute way of aliasing the set X, or is it?&amp;nbsp; Your Y is based on your decomposed expressions, so it becomes possible for Y and X to be different sets.&amp;nbsp; In fact, if you are using optional matching Y may match string representations of integers that don't exist in the integer set X.&amp;nbsp; However, Y and X will still have the same footprint if viewed in the light of integer value only.&amp;nbsp; After all, there is an int.Parse somewhere in the implementation and outside of the theory.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;Still looking for a darn math mark-up tool.&amp;nbsp; I've seen the MathML crap, but it requires a specialized viewer.&amp;nbsp; There are a couple of math packages that export in HTML and maybe those would work.&amp;nbsp; I'll have to keep my eyes open and find something, since the above looks a lot better in set notation than it does in written out verbose American English.&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx&amp;amp;;subject=Have+you+ever+dreamed+a+problem%3f++Two+nights+in+a+row+now%2c+no+sleep+for+me+I+guess..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx&amp;amp;;title=Have+you+ever+dreamed+a+problem%3f++Two+nights+in+a+row+now%2c+no+sleep+for+me+I+guess..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx&amp;amp;title=Have+you+ever+dreamed+a+problem%3f++Two+nights+in+a+row+now%2c+no+sleep+for+me+I+guess..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx&amp;amp;;title=Have+you+ever+dreamed+a+problem%3f++Two+nights+in+a+row+now%2c+no+sleep+for+me+I+guess..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx&amp;amp;;title=Have+you+ever+dreamed+a+problem%3f++Two+nights+in+a+row+now%2c+no+sleep+for+me+I+guess...&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/26/334.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=334" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /><category term="Numeric Parsing" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Numeric+Parsing/default.aspx" /></entry><entry><title>Added some extra features to my algorithms, and so I'm posting an input batch file you can use for testing your own.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx</id><published>2004-05-26T00:50:00Z</published><updated>2004-05-26T00:50:00Z</updated><content type="html">&lt;P&gt;The batch file echo's expected output, along with the ouput of the program you are running.&amp;nbsp; If your output is different then you might have some code refactoring to do.&amp;nbsp; As soon as I get some algorithms from third parties I'll post my full algorithm.&amp;nbsp; I expect an algorithm from Darren shortly ;-)&lt;/P&gt;
&lt;P&gt;&lt;A id=CategoryEntryList.ascx_EntryStoryList_Entries__ctl0_TitleUrl HREF="/justin_rogers/articles/1173.aspx"&gt;Batch file for testing your regex range generator.&lt;/A&gt;&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx&amp;amp;;subject=Added+some+extra+features+to+my+algorithms%2c+and+so+I%27m+posting+an+input+batch+file+you+can+use+for+testing+your+own." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx&amp;amp;;title=Added+some+extra+features+to+my+algorithms%2c+and+so+I%27m+posting+an+input+batch+file+you+can+use+for+testing+your+own." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx&amp;amp;title=Added+some+extra+features+to+my+algorithms%2c+and+so+I%27m+posting+an+input+batch+file+you+can+use+for+testing+your+own." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx&amp;amp;;title=Added+some+extra+features+to+my+algorithms%2c+and+so+I%27m+posting+an+input+batch+file+you+can+use+for+testing+your+own." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx&amp;amp;;title=Added+some+extra+features+to+my+algorithms%2c+and+so+I%27m+posting+an+input+batch+file+you+can+use+for+testing+your+own.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/333.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=333" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /><category term="Numeric Parsing" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Numeric+Parsing/default.aspx" /></entry><entry><title>Thanks to Darren, I had to examine the process of neg-pos neg-neg ranges in addition to what I was already doing.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx</id><published>2004-05-25T06:46:00Z</published><updated>2004-05-25T06:46:00Z</updated><content type="html">&lt;P&gt;Negative ranges are actually quite easy to implement.&amp;nbsp; There are a couple of options:&lt;/P&gt;
&lt;OL&gt;
&lt;OL&gt;
&lt;LI&gt;Parm1 +, Parm2 +&lt;/LI&gt;
&lt;LI&gt;Parm1 -, Parm2 -&lt;/LI&gt;
&lt;LI&gt;Parm1 -, Parm2 0&lt;/LI&gt;
&lt;LI&gt;Parm1 0, Parm2 -&lt;/LI&gt;
&lt;LI&gt;Parm1 +, Parm2 0&lt;/LI&gt;
&lt;LI&gt;Parm1 0, Parm2 +&lt;/LI&gt;&lt;/OL&gt;&lt;/OL&gt;
&lt;P&gt;6 options isn't that bad.&amp;nbsp; We also have relative equality, but we'll get to that.&amp;nbsp; So we have some new scenarios.&amp;nbsp; The first is a parm1 being negative and parm2 being neg or 0.&amp;nbsp; This is basically a postive range with a negative symbol attached:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;return "-" + BuildCaptureForRange(Math.Abs(range1), Math.Abs(range2));&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;Next we have parm1 still negative with the absolute value equal to parm2.&amp;nbsp; This is the cast of -500 through 500 and we want to special case it by making an optional negative symbol.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;return "-?" + BuildCaptureForRange(0, range2);&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;The final special case is that we have a negative and postive value.&amp;nbsp; This is basically two ranges, one from 0 to Abs(parm1) and one from 0 to parm2.&amp;nbsp; We switch between the two cases using a conditional expression of (?(-)-(neg range)|(pos range)).&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;return "(?(-)-" + BuildCaptureForRange(0, Math.Abs(range1)) + "|" + BuildCaptureForRange(0, range2) + ")";&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Note that I've minimized the number of special cases by pre-sorting parm1 and parm2 by value to make sure the lesser value is always in the first local.&amp;nbsp; I also have a special case (value) whenever range1 is equal to range2.&amp;nbsp; The overall code appears to be:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;// quick return for simple ranges&lt;BR&gt;if ( range1 == range2 ) {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return "(" + range1 + ")";&lt;BR&gt;}&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;// Swap the min/max values&lt;BR&gt;if ( range1 &amp;gt; range2 ) {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int temp = range2;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; range2 = range1;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; range1 = temp;&lt;BR&gt;}&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;if ( range1 &amp;lt; 0 &amp;amp;&amp;amp; range2 &amp;lt;= 0 ){&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return "-" + BuildCaptureForRange(Math.Abs(range1), Math.Abs(range2));&lt;BR&gt;} else if ( range1 &amp;lt; 0 ) {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ( Math.Abs(range1) == range2 ) {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return "-?" + BuildCaptureForRange(0, range2);&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; } else {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return "(?(-)-" + BuildCaptureForRange(0, Math.Abs(range1)) + "|" + BuildCaptureForRange(0, range2) + ")";&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR&gt;}&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Now that I've completed my &lt;STRONG&gt;Darren Task&lt;/STRONG&gt; I can probably go to bed.&amp;nbsp; You can't mention&amp;nbsp;something to me and not expect some code the next day (hour or maybe even minute).&amp;nbsp; For testing, I'm including a sample batch file for you to run over top of your own algorithms:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;@echo off&lt;BR&gt;BuildRangeGroups 5 255&lt;BR&gt;BuildRangeGroups 222 228&lt;BR&gt;BuildRangeGroups 699 700&lt;BR&gt;BuildRangeGroups 699 701&lt;BR&gt;BuildRangeGroups 699 710&lt;BR&gt;BuildRangeGroups 698 710&lt;BR&gt;BuildRangeGroups 7650 7710&lt;BR&gt;BuildRangeGroups 50 7710&lt;BR&gt;BuildRangeGroups 6998998 7000000&lt;BR&gt;BuildRangeGroups 7000 7999&lt;BR&gt;BuildRangeGroups 7555 7999&lt;BR&gt;BuildRangeGroups 1000 11000&lt;BR&gt;BuildRangeGroups 0 69989&lt;BR&gt;BuildRangeGroups -500 0&lt;BR&gt;BuildRangeGroups -500 500&lt;BR&gt;BuildRangeGroups 0 -500&lt;BR&gt;BuildRangeGroups 500 -500&lt;BR&gt;BuildRangeGroups -50 7710&lt;BR&gt;BuildRangeGroups -1000 69989&lt;BR&gt;BuildRangeGroups 0 1000&lt;BR&gt;BuildRangeGroups 0 255&lt;BR&gt;BuildRangeGroups 0 5000&lt;BR&gt;BuildRangeGroups 0 11000&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;Along with the required output so you can validate that your algorithm is complete.&amp;nbsp; If you only want to implement part of the algorithm, then that is fine.&amp;nbsp; Nothing requires that you implement the entire thing.&amp;nbsp; Just use the expressions below as reference.&amp;nbsp; Also, let me know if you think there is a range of specific note that might cause issues with a generator or even if some of my generated expressions below are inaccurate.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;(25[0-5]|[5-9]|2[0-4]\d|1?\d\d)&lt;BR&gt;(22[2-8])&lt;BR&gt;(700|699|[7-6]\d\d)&lt;BR&gt;(699|70[01]|[7-6]\d\d)&lt;BR&gt;(710|699|70\d|[7-6]\d\d)&lt;BR&gt;(710|69[89]|70\d|[7-6]\d\d)&lt;BR&gt;(7710|770\d|76[5-9]\d|7[7-6]\d\d)&lt;BR&gt;(7710|770\d|[5-9]\d|7[0-6]\d\d|[1-6]?\d\d\d)&lt;BR&gt;(7000000|699899[89]|6999\d\d\d|[7-6]\d\d\d\d\d\d)&lt;BR&gt;(7\d\d\d)&lt;BR&gt;(755[5-9]|75[6-9]\d|7[6-9]\d\d)&lt;BR&gt;(11000|10\d\d\d|[1-9]\d\d\d|\d\d\d\d)&lt;BR&gt;(699[0-8]\d|69[0-8]\d\d|6[0-8]\d\d\d|[1-5]?\d?\d?\d?\d)&lt;BR&gt;-(500|[1-4]?\d?\d)&lt;BR&gt;-?(500|[1-4]?\d?\d)&lt;BR&gt;-(500|[1-4]?\d?\d)&lt;BR&gt;-?(500|[1-4]?\d?\d)&lt;BR&gt;(?(-)-(50|[1-4]?\d)|(7710|770\d|7[0-6]\d\d|[1-6]?\d?\d?\d))&lt;BR&gt;(?(-)-(1000|\d?\d?\d)|(699[0-8]\d|69[0-8]\d\d|6[0-8]\d\d\d|[1-5]?\d?\d?\d?\d))&lt;BR&gt;(1000|\d?\d?\d)&lt;BR&gt;(25[0-5]|2[0-4]\d|1?\d?\d)&lt;BR&gt;(5000|[1-4]?\d?\d?\d)&lt;BR&gt;(11000|10\d\d\d|\d?\d?\d?\d)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx&amp;amp;;subject=Thanks+to+Darren%2c+I+had+to+examine+the+process+of+neg-pos+neg-neg+ranges+in+addition+to+what+I+was+already+doing." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx&amp;amp;;title=Thanks+to+Darren%2c+I+had+to+examine+the+process+of+neg-pos+neg-neg+ranges+in+addition+to+what+I+was+already+doing." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx&amp;amp;title=Thanks+to+Darren%2c+I+had+to+examine+the+process+of+neg-pos+neg-neg+ranges+in+addition+to+what+I+was+already+doing." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx&amp;amp;;title=Thanks+to+Darren%2c+I+had+to+examine+the+process+of+neg-pos+neg-neg+ranges+in+addition+to+what+I+was+already+doing." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx&amp;amp;;title=Thanks+to+Darren%2c+I+had+to+examine+the+process+of+neg-pos+neg-neg+ranges+in+addition+to+what+I+was+already+doing.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/332.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=332" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /><category term="Numeric Parsing" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Numeric+Parsing/default.aspx" /></entry><entry><title>And I thought I was done.  Reducing certain cases to optional captures is easy, others are more difficult...</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx</id><published>2004-05-25T04:52:00Z</published><updated>2004-05-25T04:52:00Z</updated><content type="html">&lt;P&gt;Optional captures come into play when the minimum range value has fewer digits than the maximum range value.&amp;nbsp; Let's take a look at some ranges that exhibit the behavior so we have a base with which to work from.&amp;nbsp; The first range is a familiar one, and is the byte range of 0 - 255:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;25[0-5]&lt;BR&gt;2[0-4]\d&lt;BR&gt;1?\d?\d&amp;nbsp;&amp;nbsp; (match any 1 digit number, optionally any 2 digit number, optionally any 3 digit number starting with 1)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;To examine the complexity of making an arbitary rule out of the above, let's see what happens when you add the range 5-255:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;25[0-5]&lt;BR&gt;2[0-4]\d&lt;BR&gt;[5-9]&lt;BR&gt;1?\d\d&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;So now that we have a single digit lower range, we now add at least 1 new capture.&amp;nbsp; When this happens, we move the optional clauses 1 left.&amp;nbsp; A simple shift.&amp;nbsp; That isn't too bad.&amp;nbsp; Since we don't get a huge amount of simplicity when using the optional captures, I don't see much of a reason to implement them.&amp;nbsp; Adding them for the basic case of the lower range being 0 isn't bad though and only changes the basic algorithm by some small amount.&amp;nbsp; How many optional captures will there be?&amp;nbsp; Max(0, MaxDigits - MinDigits - 1)...&amp;nbsp; In the case of a lower range of 5 and upper of 255 this becomes Max(0, 3 - 1 - 1) or Max(0, 1) or 1 optional capture.&amp;nbsp; Looking at 0 to 255 this becomes Max(0, 3 - 0 - 1) or Max(0, 2) or 2.&amp;nbsp; In this case 0 represents a lack of any digits.&lt;/P&gt;
&lt;P dir=ltr&gt;Things are definitely getting interesting now.&amp;nbsp; I've upgraded the algorithm a bit to solve most optional captures, but I've probably added some edge cases that produce failures in other areas of the algorithm now.&amp;nbsp; Here is some output from the upgraded algorithm (upgrade code is not yet hosted anywhere since&amp;nbsp;I want to test it for new failures).&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;runtests.cmd&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 0 255&lt;BR&gt;(25[0-5]|2[0-4]\d|1?\d?\d)&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 5 255&lt;BR&gt;(25[0-5]|[5-9]|2[0-4]\d|1?\d\d)&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 222 228&lt;BR&gt;(22[2-8])&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 699 700&lt;BR&gt;(700|699)&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 699 701&lt;BR&gt;(699|70[01])&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 699 710&lt;BR&gt;(710|699|70\d)&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 698 710&lt;BR&gt;(710|69[89]|70\d)&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 7650 7710&lt;BR&gt;(7710|770\d|76[5-9]\d)&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 50 7710&lt;BR&gt;(7710|770\d|[5-9]\d|7[0-6]\d\d|[1-6]?\d\d\d)&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 6998998 7000000&lt;BR&gt;(7000000|699899[89]|6999\d\d\d)&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 7000 7999&lt;BR&gt;(7[0-9]\d\d)&lt;/P&gt;
&lt;P dir=ltr&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 7555 7999&lt;BR&gt;(755[5-9]|75[6-9]\d|7[6-9]\d\d)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx&amp;amp;;subject=And+I+thought+I+was+done.++Reducing+certain+cases+to+optional+captures+is+easy%2c+others+are+more+difficult..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx&amp;amp;;title=And+I+thought+I+was+done.++Reducing+certain+cases+to+optional+captures+is+easy%2c+others+are+more+difficult..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx&amp;amp;title=And+I+thought+I+was+done.++Reducing+certain+cases+to+optional+captures+is+easy%2c+others+are+more+difficult..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx&amp;amp;;title=And+I+thought+I+was+done.++Reducing+certain+cases+to+optional+captures+is+easy%2c+others+are+more+difficult..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx&amp;amp;;title=And+I+thought+I+was+done.++Reducing+certain+cases+to+optional+captures+is+easy%2c+others+are+more+difficult...&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/25/331.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=331" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /><category term="Numeric Parsing" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Numeric+Parsing/default.aspx" /></entry><entry><title>An Algorithm has been prepared, I'm linking it into an article so you don't have to look at it if you don't want.  It is a spoiler.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx</id><published>2004-05-25T03:33:00Z</published><updated>2004-05-25T03:33:00Z</updated><content type="html">&lt;P&gt;The final algorithm has quite a few special cases, but I've attempted to refactor the code and make it as small as possible.&amp;nbsp; For readability I've chosen to use [0-9] in place of \d.&amp;nbsp; You can easily change this out and use the more common \d.&amp;nbsp; Note that the algorithm is not fully optimizing and it doesn't necessarily make use of optional characters, so patterns such as 0 255 turn out to be 5 alternation group items in place of&amp;nbsp;the 4 people commonly find and the 3 that is possible.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 0 255&lt;BR&gt;(25[0-5]|[0-9]|2[0-4][0-9]|[1-9][0-9]|1[0-9][0-9])&lt;/P&gt;
&lt;P&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 222 228&lt;BR&gt;(22[2-8])&lt;/P&gt;
&lt;P&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 699 700&lt;BR&gt;(700|699)&lt;/P&gt;
&lt;P&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 699 701&lt;BR&gt;(699|70[01])&lt;/P&gt;
&lt;P&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 699 710&lt;BR&gt;(710|699|70[0-9])&lt;/P&gt;
&lt;P&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 698 710&lt;BR&gt;(710|69[89]|70[0-9])&lt;/P&gt;
&lt;P&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 7650 7710&lt;BR&gt;(7710|765[0-9]|770[0-9]|76[6-9][0-9])&lt;/P&gt;
&lt;P&gt;C:\Projects\CSharp\RegularExpressions\BuildRangeGroups&amp;gt;BuildRangeGroups 6998998 7000000&lt;BR&gt;(7000000|699899[89]|6999[0-9][0-9][0-9])&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;Here is the code if you are eager to take a look.&amp;nbsp; &lt;A id=CategoryEntryList.ascx_EntryStoryList_Entries__ctl0_TitleUrl HREF="/justin_rogers/articles/1161.aspx"&gt;Sample Algorithm for Regex Range Validation&lt;/A&gt;&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx&amp;amp;;subject=An+Algorithm+has+been+prepared%2c+I%27m+linking+it+into+an+article+so+you+don%27t+have+to+look+at+it+if+you+don%27t+want.++It+is+a+spoiler." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx&amp;amp;;title=An+Algorithm+has+been+prepared%2c+I%27m+linking+it+into+an+article+so+you+don%27t+have+to+look+at+it+if+you+don%27t+want.++It+is+a+spoiler." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx&amp;amp;title=An+Algorithm+has+been+prepared%2c+I%27m+linking+it+into+an+article+so+you+don%27t+have+to+look+at+it+if+you+don%27t+want.++It+is+a+spoiler." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx&amp;amp;;title=An+Algorithm+has+been+prepared%2c+I%27m+linking+it+into+an+article+so+you+don%27t+have+to+look+at+it+if+you+don%27t+want.++It+is+a+spoiler." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx&amp;amp;;title=An+Algorithm+has+been+prepared%2c+I%27m+linking+it+into+an+article+so+you+don%27t+have+to+look+at+it+if+you+don%27t+want.++It+is+a+spoiler.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/330.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=330" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /><category term="Numeric Parsing" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Numeric+Parsing/default.aspx" /></entry><entry><title>That range algorithm was more difficult than I originally pointed out due to some strange oddities.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx</id><published>2004-05-25T00:51:00Z</published><updated>2004-05-25T00:51:00Z</updated><content type="html">&lt;P&gt;What kinds of strange oddities would I be talking about?&amp;nbsp; Well, check out the following oddity:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;699 through 700&lt;BR&gt;699|700&lt;/P&gt;
&lt;P&gt;699 through 701&lt;BR&gt;699|70[01]&lt;/P&gt;
&lt;P&gt;699 through 718&lt;BR&gt;699|70\d|71[0-8]&lt;/P&gt;
&lt;P&gt;698 through 701&lt;BR&gt;69[89]|70[01]&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;You start to get the picture.&amp;nbsp; There are some oddities when the max value has 0's and the min value has 9's that lead the right edge of the ranges.&amp;nbsp; A standard algorithm as I had outlaid before would do the following:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;699 through 700&lt;BR&gt;leftmost = 0&lt;BR&gt;starti = min.Length - 1 = 2&lt;/P&gt;
&lt;P&gt;69[9-9]|70[0-0]|6[9-9]\d|7[0-0]\d&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;You see the problem?&amp;nbsp; Based on the algorithm of simply finding the leftmost variable character isn't enough, since we also have some dependence on the rightmost character.&amp;nbsp; There is one solution we can code in, only process a pattern possibility if the character isn't an extreme character.&amp;nbsp; For the lower bound ranges extremes are 9's and for upper bound extremes are 0.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;699 through 700&lt;BR&gt;leftmost&amp;nbsp;= 0, starti =min.Length - 1=2, skip starti = 2 and starti = 1 based on extremes&lt;BR&gt;go to final range processing (step 5).&amp;nbsp; Since max-min is now only 1, we won't add any patterns.&amp;nbsp; A null result-set?&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;What our algorithm really needed to be was 699|700.&amp;nbsp; However, based on our rules, that won't happen.&amp;nbsp; Another rule, then might be that we add the pattern if we've approached (i-1) == leftmost.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;699 through 708&lt;BR&gt;leftmost = 0, starti = min.Length - 1 = 2&lt;BR&gt;starti = 2, skip 9 in 699, process 8 in 708, results 70[0-8]&lt;BR&gt;starti = 1, skip by default, hit rule (starti-1)==leftmost, same for 708, results 699|708&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;So we need another rule.&amp;nbsp; Only process the default pattern if we haven't already processed a pattern.&amp;nbsp; By that means, the end result is now 699|70[0-8].&amp;nbsp; Darn, we start to add a large number of rules to the generation of the range alternation group.&amp;nbsp; There are still some items I haven't fully made elegant yet, as I've recently realized.&amp;nbsp; Take the following:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;699 through 710&lt;BR&gt;699|70\d|710&lt;/P&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;By our default rule of only processing non-zero digits would exclude us from creating 70\d (actually it wouldn't, since our code would skip it the first time through and then create it the second time through using one of the exceptions).&amp;nbsp; Since 70\d gets created as an exception 710 never gets created.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr style="MARGIN-RIGHT: 0px"&gt;I'll leave these thoughts with you.&amp;nbsp; I'm still waiting for some algorithms to show up!&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx&amp;amp;;subject=That+range+algorithm+was+more+difficult+than+I+originally+pointed+out+due+to+some+strange+oddities." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx&amp;amp;;title=That+range+algorithm+was+more+difficult+than+I+originally+pointed+out+due+to+some+strange+oddities." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx&amp;amp;title=That+range+algorithm+was+more+difficult+than+I+originally+pointed+out+due+to+some+strange+oddities." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx&amp;amp;;title=That+range+algorithm+was+more+difficult+than+I+originally+pointed+out+due+to+some+strange+oddities." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx&amp;amp;;title=That+range+algorithm+was+more+difficult+than+I+originally+pointed+out+due+to+some+strange+oddities.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/24/329.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=329" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Numeric Parsing" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Numeric+Parsing/default.aspx" /></entry><entry><title>Playing with the Regulator utility and I quickly find myself being second guessed by the Intellisense...</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx</id><published>2004-05-23T02:24:00Z</published><updated>2004-05-23T02:24:00Z</updated><content type="html">&lt;P&gt;I can honestly say that I really don't like intellisense, but the addition of intellisense with the Regulator application is probably pretty powerful for a large number of users.&amp;nbsp; I started playing with the beta, and figured I'd load up some patterns that I've been playing with.&amp;nbsp; As I started typing my patterns in, I kept getting an intellisense popup.&amp;nbsp; When I continue typing, my last typed character often gets replaced.&amp;nbsp; A good example would be creating a digit match:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;\d&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;Once you type that in you'll get the intellisense popup.&amp;nbsp; However, try and type another digit escape sequence, and you'll find your d has been replaced by the next \ character.&amp;nbsp; Not fun my friend, not fun at all.&lt;/P&gt;
&lt;P dir=ltr&gt;Nice little tool though I must say.&amp;nbsp; It definitely has the ability to make my life a bit easier.&amp;nbsp; I had read some posts on making enhancements to the tool.&amp;nbsp; Some enhancements I'd like to see include plug-ins for generating test input to the expressions.&amp;nbsp; Right now you can load tests from files and that is pretty nice, but I'd like something that models your expression and tries to make edge case input, that would be really nice.&amp;nbsp; More cool features might include automatically generating expressions that match a large input such as a web page.&amp;nbsp; Hopefully some of Darren's work on the HTML parser might help here.&lt;/P&gt;
&lt;P dir=ltr&gt;A little jewel I plan on developing is the performant compiled expression.&amp;nbsp; This won't exist for every expression, but for many types of expression you can hard-code in certain assumptions that make processing much more performant.&amp;nbsp; In addtion, you can create strongly typed processing classes that return real results rather than simply captured expanses of string.&amp;nbsp; Take the following:&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;(?&amp;lt;ID&amp;gt;\d\d\d\d)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;The above could easily be translated and parsed as an integer and placed within a field or property on a capture structure.&amp;nbsp; And this time, not a generic name/value structure like that used in the current .NET expressions.&amp;nbsp; The added amount of time consumed to strongly type the value will be more than made up for in terms of time used to access the items later.&amp;nbsp; In addition, you can compile with optimizations for specific use cases.&amp;nbsp; If the expression is for validation (IsMatch), then there is no reason to store captures and groups or even listen to hints about creating named capture groups, ordinal groups, etc...&amp;nbsp; This can be very powerful in server environments where validation should occur without the memory overhead of object allocations.&amp;nbsp; Other optimizations can also be performed making the use of regular expressions more like a scalpel than a blunt axe.&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx&amp;amp;;subject=Playing+with+the+Regulator+utility+and+I+quickly+find+myself+being+second+guessed+by+the+Intellisense..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx&amp;amp;;title=Playing+with+the+Regulator+utility+and+I+quickly+find+myself+being+second+guessed+by+the+Intellisense..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx&amp;amp;title=Playing+with+the+Regulator+utility+and+I+quickly+find+myself+being+second+guessed+by+the+Intellisense..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx&amp;amp;;title=Playing+with+the+Regulator+utility+and+I+quickly+find+myself+being+second+guessed+by+the+Intellisense..." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx&amp;amp;;title=Playing+with+the+Regulator+utility+and+I+quickly+find+myself+being+second+guessed+by+the+Intellisense...&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/328.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=328" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /></entry><entry><title>0 through N ranges are interesting, but what about M through N ranges.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx</id><published>2004-05-22T22:40:00Z</published><updated>2004-05-22T22:40:00Z</updated><content type="html">&lt;P&gt;You can read about the decomposition of 0 through N ranges here &lt;A id=_73f89485eaf_HomePageDays_DaysList__ctl0_DayItem_DayList__ctl1_TitleUrl HREF="/justin_rogers/archive/2004/05/21/1139.aspx"&gt;Value range parsing using Regular Expressions. Breaking down the dotted decimal IP byte processing.&lt;/A&gt;&amp;nbsp; Make sure to read the comments, because there is a way to use even fewer decompositions than I listed in the article through the use of optional captures.&lt;/P&gt;
&lt;P&gt;Now the trick is examining ranges with both upper and lower bounds.&amp;nbsp; This is an empirical review as I'll take several ranges and decompose them, but I won't necessarily explain the mathematics for finding the count of the decompositions.&amp;nbsp; We'll start with the decomposition of 222-555, which requires 5 decompositions to yield all results.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;22[2-9]&lt;BR&gt;2[3-9]\d&lt;BR&gt;[3-4]\d\d&lt;BR&gt;5[0-4]\d&lt;BR&gt;55[0-5]&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;The next process will yield the range validation of 22-555.&amp;nbsp; The difference here is that instead of having all numbers in the range of the same length, we now have numbers of varying lengths.&amp;nbsp; The decomposition doesn't change much, and is equivalent to the decomposition of the range 222-555, except that the lower bound number of 22 is now equivalent to 022-555 (this allows our decomposition to yield the same results).&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;2[2-9]&lt;BR&gt;[3-9]\d&lt;BR&gt;[1-4]\d\d&lt;BR&gt;5[0-4]\d&lt;BR&gt;55[0-5]&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;You may notice a simple pattern between the two.&amp;nbsp; There is a symmetry between where the character classes are used to denote custom ranges of digits and the static captures (both \d and non custom character classes are considered static captures for the purpose of this example).&amp;nbsp; There is definitely a mathematical relationship between the layout of the ranges, and I could probably spend 10 or so pages explaining it, but I'll leave it out.&amp;nbsp; The point is there is also an algorithmic association between the numbers that recursively works from the right side of the first range back through to the left and back to the right of the second range.&lt;/P&gt;
&lt;P dir=ltr&gt;First, sort the ranges by min/max.&amp;nbsp; This is important.&amp;nbsp; Second, determine the first non matching character in the ranges starting from the left.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;1.&amp;nbsp; 222 and 555 are sorted thusly: min(222), max(555)&lt;BR&gt;2.&amp;nbsp; compare 2 to 5, since they are unmatched, then we start matching there.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;That second step is important.&amp;nbsp; Say you were sorting 222 and 228, you an see that the first non-matching character is the ones place.&amp;nbsp; That means the left-side of the range doesn't matter for the purposes of generating our ranges.&amp;nbsp; The range match simply becomes 22[2-8], and there is only a single decomposition.&amp;nbsp; That means our decomposition is only based on the difference in the right side of the equation or rather based on the left-most digit that doesn't match.&amp;nbsp; Any missing left side digits should be replaced with 0's and used in this process.&lt;/P&gt;
&lt;P dir=ltr&gt;The third step is to build range groups from the digit value to 9 as you walk the first string from right to left.&amp;nbsp; The fourth is to do the same for the second string only this time you build from 0 up to digit value.&amp;nbsp; This results in (digit-1)*2 decompositions.&amp;nbsp; The final decomposition or fifth step is to build the middle range.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P dir=ltr&gt;3.&amp;nbsp; 22[2-9], 2[3-9]\d&lt;BR&gt;4.&amp;nbsp; 55[0-5], 5[0-4]\d&lt;BR&gt;5.&amp;nbsp; [3-4]\d\d&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir=ltr&gt;&lt;STRONG&gt;[Editorial Note: I just drank some death sauce and I'm experiencing a slight hot twinge in my mouth]&lt;/STRONG&gt;&lt;/P&gt;
&lt;P dir=ltr&gt;The fifth step is dependent upon the relationship between the upper and lower bound numbers.&amp;nbsp; If the difference between the two numbers is less than 2, then you leave out the final step, and it isn't needed.&amp;nbsp; If the difference is greater than 2, then you subtract 1 from the upper and add 1 to the lower to get your range.&lt;/P&gt;
&lt;P dir=ltr&gt;No refactoring, no fluff, just a quick range creation algorithm for regular expressions.&amp;nbsp; I challenge you to convert the algorithm into code.&amp;nbsp; There are several ways to implement each of the steps, and maybe even some more interesting algorithms.&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx&amp;amp;;subject=0+through+N+ranges+are+interesting%2c+but+what+about+M+through+N+ranges." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx&amp;amp;;title=0+through+N+ranges+are+interesting%2c+but+what+about+M+through+N+ranges." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx&amp;amp;title=0+through+N+ranges+are+interesting%2c+but+what+about+M+through+N+ranges." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx&amp;amp;;title=0+through+N+ranges+are+interesting%2c+but+what+about+M+through+N+ranges." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx&amp;amp;;title=0+through+N+ranges+are+interesting%2c+but+what+about+M+through+N+ranges.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/22/327.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=327" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Numeric Parsing" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Numeric+Parsing/default.aspx" /></entry><entry><title>Running your expressions asynchronously and making them cancellable.</title><link rel="alternate" type="text/html" href="http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx" /><id>http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx</id><published>2004-05-22T03:55:00Z</published><updated>2004-05-22T03:55:00Z</updated><content type="html">&lt;P&gt;I blogged this over on weblogs, but I think this was the better spot for it.&amp;nbsp; I'll definitely fix the classes up a bit more and add some additional functionality.&amp;nbsp; For one, I'd like to add batching so you could run a bunch of them with timing metrics.&lt;/P&gt;
&lt;P&gt;&lt;A id=_7fc01ddadf7_HomePageDays_DaysList__ctl0_DayItem_DayList__ctl0_TitleUrl href="http://weblogs.asp.net/justin_rogers/archive/2004/05/22/139337.aspx"&gt;Asynchronous Regular Expressions using the ThreadPool and a cancellation model.&lt;/A&gt;&lt;/P&gt;
&lt;div class = "shareblock"&gt;&lt;strong&gt;Share this post:&lt;/strong&gt; &lt;a href = "mailto:?body=Thought you might like this: http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx&amp;amp;;subject=Running+your+expressions+asynchronously+and+making+them+cancellable." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx"&gt;email it!&lt;/a&gt; |  &lt;a href = "http://del.icio.us/post?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx&amp;amp;;title=Running+your+expressions+asynchronously+and+making+them+cancellable." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx"&gt;bookmark it!&lt;/a&gt; |  &lt;a href = "http://www.digg.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx&amp;amp;;phase=2" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx"&gt;digg it!&lt;/a&gt; |  &lt;a href = "http://reddit.com/submit?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx&amp;amp;title=Running+your+expressions+asynchronously+and+making+them+cancellable." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx"&gt;reddit!&lt;/a&gt; |  &lt;a href = "http://www.dotnetkicks.com/submit/?url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx&amp;amp;;title=Running+your+expressions+asynchronously+and+making+them+cancellable." target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx"&gt;kick it!&lt;/a&gt; |  &lt;a href = "https://favorites.live.com/quickadd.aspx?marklet=1&amp;amp;;mkt=en-us&amp;amp;;url=http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx&amp;amp;;title=Running+your+expressions+asynchronously+and+making+them+cancellable.&amp;amp;;top=1" target="_blank" title = "Post http://regexadvice.com/blogs/justin_rogers/archive/2004/05/21/326.aspx"&gt;live it!&lt;/a&gt;&lt;/div&gt;&lt;img src="http://regexadvice.com/aggbug.aspx?PostID=326" width="1" height="1"&gt;</content><author><name>jrogers</name><uri>http://regexadvice.com/members/jrogers.aspx</uri></author><category term="Regular Expressions" scheme="http://regexadvice.com/blogs/justin_rogers/archive/tags/Regular+Expressions/default.aspx" /></entry></feed>