I see this question come up a bit in regex so, I thought that I'd blog about
it. It has to do with 2 things: named groups and captures. First, an
example...
I have a set of attributes ascribed to a value and I want to match each of
them and then write them out - this is similar to matching attributes off of an
xml or html element:
Example text:
Attributes=(Animal=cat; Human=paul;
Car=ford; Color=green;)
Sample
pattern:
Attributes=\(((?'type'\w+)(=)(?'value'\w+)\;\s?)+\)
Problem 1: Named and Unnamed Groups
This pattern uses 2 named groups - "type" and "value" - to store each of the
attributes; it also has 2 unnamed groups, one which matches the entire attribute
string and one which matches the "=" sign between type and value.
Looking at that pattern, you know that there's going to be 4 groups and,
using logic you would probably expect them to appear in the following order:
- Group 0 : The unnamed entire match
- Group 1 : The named "type" group
- Group 2 : The unnamed "=" group
- Group 3 : The named "value" group
Unnamed Groups always come first
The first important rule of .NET regex's is that unnamed groups always come
before named groups when you are enumerating over a Groups collection. So,
the order of our groups will be:
- Group 0 : The unnamed entire match
- Group 1 : The unnamed "=" group
- Group 2 : The named "type" group
- Group 3 : The named "value" group
Problem 2: Groups and Captures
Another gotcha with this example arises when a user is attempting to write
out all of the results to the screen. As you can see, there will be:
- 1 Match - The entire string
- 4 Groups - as we've already seen
- and 4 instances of the attributes.
The question is, how to get each of those 4 attribute values? The
answer is that each Group has a Captures collection to store each
"capture". So, the idea is to get a count of the captures for a group and
then display the value at each index between 0 and the count of captures for
that group.
Here's some sample code which demonstrates how you'd do that for the example
shown above:
string pattern = @"Attributes=\(((?'type'\w+)(=)(?'value'\w+)\;\s?)+\)" ;
string input = @"Attributes=(Animal=cat; Human=paul; Car=ford; Color=green;)" ;
Match m = Regex.Match(input, pattern);
if( m.Groups["type"].Success ) {
// this will tell us how many captures we have...
int matchedItems = m.Groups["type"].Captures.Count ;
// now, enumerate the Captures and render the groups for each Capture...
for( int i=0; i<matchedItems; i++ ) {
string name = m.Groups["type"].Captures[i].Value ;
string val = m.Groups["value"].Captures[i].Value ;
Console.WriteLine("{0} = {1}", name, val) ;
}
}
Console.ReadLine() ;
And here's the output generated by the above example...
Animal = cat
Human = paul
Car = ford
Color = green