Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Michael Ash's Regex Blog

Regex Musings

Named Groups to the Rescue

I was asked to modify some text that had been built incorrectly. Basically insert some text at a certain point. First I use a regex to find the text, then insert the new value within that match.  Now since the inserted value goes inside the matched text I simply wanted use backreferences and the replace method.   Simple right?  Well not so much.

Now the text is in  a field of various rows of a database table and the text to be inserted comes from another of the fields in the same row and is an alphanumeric value.  So the inserted text value is dynamic, so I can’t simply hard code the replacement text. So the replacement text is built dynamically for each row.  The text to be modified is a certain attribute somewhere in the text.  For this example lets say it’s “id=xyz”, which is constant for all records.  Now the new text will be inserted right after the equals sign.

So for              

source  =“ {some stuff} id=xyz {more stuff}”
newText = “ab1”

 
you get

 
“{some stuff} id=ab1xyz {more stuff}”

 

Simple enough.  You use this regex  \bid=xyz\b to match the text. Then split it in to groups so you can use backreferences in the replace.  So your final regex looks like this:

 \b(bid=)(xyz)\b

 
Now group 1 contain the text up to your insertion point (id=)

And group2 contains the text after your insertion point (xyz)

So your replacement string for your regex is “$1(new data goes here)$2)” , where (new data goes here) = some alphanumeric value pulled from a second field in a row.

 
Doing this is in .Net my code looked something like this pseudo-code

Regex regexFind = new Regex(“\b(bid=)(xyz)\b”);

Get Records

For each row

             fieldA = rowFieldA  (source text)

             fieldB = rowFieldB (insert value)

            fieldA = regexFind.Replace(FieldA,String.Format(“$1{0}$2”,fieldB))

next

The Format method of the string create a replacement string for each row.

Look good?  Works find for our example but there is a problem.

For our example value of “ab1”  the string format produces “$1ab1$2” which is exactly what we want, but as this field is alphanumeric so it could begin with a number which causes a problem.  Say for the next record the value of the text to be inserted is “12a” the format method produces a replacement string of “$112a$2”, which is not good.  Syntactically it’s fine but it’s not what we want, because instead of trying to inserts some text between group 1 and 2, which is what we want to do, it is trying to insert text between group 112 and group 2.  As there is no group 112 it assumes $112 is literal text so your final result is “id=$112axyz”

Ok this is where named group become handy (necessary?).  If you used name groups in your regex and replacement string you can avoid this problem

Change the regex to \b(?<att>bid=)(?<val>xyz)\b

And your replacement string to “${att}(new data goes here)${val})”

Now if you are using the string format method there is one more hoop you have to jump through because the  regex engine and format method both use the curly braces there is a conflict and the format method will complain so you have to write it like this

String.Format("${0}att{1}{2}${0}val{1}","{","}",newValue)

To get the desired replacement string.
 

When I started writing this I thought this was the only way to get this to work which means you could only solve this with a regex engine, like .Net, that supported named groups, but I’ve thought of a second way.

But whenever a group in your replacement string can be followed by a digit you may want to consider using named groups to avoid unexpected surprizes

 


Sponsor
Published Wednesday, September 28, 2005 3:06 PM by mash

Comments

No Comments
Anonymous comments are disabled