Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Justin's Regex Blog

Thinking in Regex

And I thought I was done. Reducing certain cases to optional captures is easy, others are more difficult...

Optional captures come into play when the minimum range value has fewer digits than the maximum range value.  Let's take a look at some ranges that exhibit the behavior so we have a base with which to work from.  The first range is a familiar one, and is the byte range of 0 - 255:

25[0-5]
2[0-4]\d
1?\d?\d   (match any 1 digit number, optionally any 2 digit number, optionally any 3 digit number starting with 1)

To examine the complexity of making an arbitary rule out of the above, let's see what happens when you add the range 5-255:

25[0-5]
2[0-4]\d
[5-9]
1?\d\d

So now that we have a single digit lower range, we now add at least 1 new capture.  When this happens, we move the optional clauses 1 left.  A simple shift.  That isn't too bad.  Since we don't get a huge amount of simplicity when using the optional captures, I don't see much of a reason to implement them.  Adding them for the basic case of the lower range being 0 isn't bad though and only changes the basic algorithm by some small amount.  How many optional captures will there be?  Max(0, MaxDigits - MinDigits - 1)...  In the case of a lower range of 5 and upper of 255 this becomes Max(0, 3 - 1 - 1) or Max(0, 1) or 1 optional capture.  Looking at 0 to 255 this becomes Max(0, 3 - 0 - 1) or Max(0, 2) or 2.  In this case 0 represents a lack of any digits.

Things are definitely getting interesting now.  I've upgraded the algorithm a bit to solve most optional captures, but I've probably added some edge cases that produce failures in other areas of the algorithm now.  Here is some output from the upgraded algorithm (upgrade code is not yet hosted anywhere since I want to test it for new failures).

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>runtests.cmd

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 0 255
(25[0-5]|2[0-4]\d|1?\d?\d)

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 5 255
(25[0-5]|[5-9]|2[0-4]\d|1?\d\d)

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 222 228
(22[2-8])

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 699 700
(700|699)

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 699 701
(699|70[01])

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 699 710
(710|699|70\d)

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 698 710
(710|69[89]|70\d)

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 7650 7710
(7710|770\d|76[5-9]\d)

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 50 7710
(7710|770\d|[5-9]\d|7[0-6]\d\d|[1-6]?\d\d\d)

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 6998998 7000000
(7000000|699899[89]|6999\d\d\d)

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 7000 7999
(7[0-9]\d\d)

C:\Projects\CSharp\RegularExpressions\BuildRangeGroups>BuildRangeGroups 7555 7999
(755[5-9]|75[6-9]\d|7[6-9]\d\d)

Sponsor
Published Tuesday, May 25, 2004 12:52 AM by jrogers

Comments

No Comments
Anonymous comments are disabled