I was asked to modify some text that had been built
incorrectly. Basically insert some text at a certain point. First I use a regex
to find the text, then insert the new value within that match. Now since the inserted value goes inside the
matched text I simply wanted use backreferences and the replace method. Simple right?
Well not so much.
Now the text is in a field of various rows of a database table and the
text to be inserted comes from another of the fields in the same row and is an
alphanumeric value. So the inserted text
value is dynamic, so I can’t simply hard code the replacement text. So the
replacement text is built dynamically for each row. The text to be modified is a certain
attribute somewhere in the text. For this
example lets say it’s “id=xyz”, which is constant for all records. Now the new text will be inserted right after
the equals sign.
So for
source =“ {some stuff}
id=xyz {more stuff}”
newText = “ab1”
you get
“{some stuff} id=ab1xyz {more stuff}”
Simple enough. You
use this regex \bid=xyz\b to match the
text. Then split it in to groups so you can use backreferences in the
replace. So your final regex looks like
this:
\b(bid=)(xyz)\b
Now group 1 contain the text up to your insertion point
(id=)
And group2 contains the text after your insertion point (xyz)
So your replacement string for your regex is “$1(new data
goes here)$2)” , where (new data goes here) = some alphanumeric value pulled
from a second field in a row.
Doing this is in .Net my code looked something like this pseudo-code
Regex regexFind = new Regex(“\b(bid=)(xyz)\b”);
Get Records
For each row
fieldA
= rowFieldA (source text)
fieldB
= rowFieldB (insert value)
fieldA =
regexFind.Replace(FieldA,String.Format(“$1{0}$2”,fieldB))
next
The Format method of the string create a replacement string
for each row.
Look good? Works find
for our example but there is a problem.
For our example value of “ab1” the string format produces “$1ab1$2” which is
exactly what we want, but as this field is alphanumeric so it could begin with a
number which causes a problem. Say
for the next record the value of the text to be inserted is “12a” the format
method produces a replacement string of “$112a$2”, which is not good. Syntactically it’s fine but it’s not what we
want, because instead of trying to inserts some text between group 1 and 2,
which is what we want to do, it is trying to insert text between group 112 and
group 2. As there is no group 112 it
assumes $112 is literal text so your final result is “id=$112axyz”
Ok this is where named group become handy (necessary?). If you used name groups in your regex and
replacement string you can avoid this problem
Change the regex to \b(?<att>bid=)(?<val>xyz)\b
And your replacement string to “${att}(new data goes here)${val})”
Now if you are using the string format method there is one
more hoop you have to jump through because the
regex engine and format method both use the curly braces there is a
conflict and the format method will complain so you have to write it like this
String.Format("${0}att{1}{2}${0}val{1}","{","}",newValue)
To get the desired replacement string.
When I started writing this I thought this was the only way
to get this to work which means you could only solve this with a regex engine, like .Net,
that supported named groups, but I’ve thought of a second way.
But whenever a group in your replacement string
can be followed by a digit you may want to consider using named groups
to avoid unexpected surprizes