What Michael is alluding to is that, especially for replacements, you need to account for EVERY character from the first one that takes part in the overall match to the last.
Also, you need to understand that the regex engine will assume that there are a set of parentheses around the whole pattern which it refers to as Match Group #0. If you were to simply try to use the pattern:
\b(?:ma?c)?(\w)
on the text lines:
mcdonald
macdonald
wombat
you would get a match of "mcd", "macd" and "w" from the regex engine for each line. You would ALSO get "d", "d" and "w" in match group #1 for each match. (There is no match group #2 in this pattern).
The key thing to understand is that a non-capturing group will not create a separately numbered (or named) match group, but the characters WILL be included in the overall match group #0 text. You cannot "skip over" characters in this group.
The problem comes when you try to use the regex "replace" function. What it will do is to remove all of the matched characters (i.e. those n match group #0) and then use your replacement string to build up whatever should go in its place. Therefore if you simply tried to use the '$1' replacement string on the above text you would get
donald
donald
wombat
as the "mcd", "macd" and "w" have been replaced by the "d", "d" and "w" respectively. The remaining characters have not been touched at all.
Therefore, if you want to do something with the "d" but leave the preceding characters (i.e. the "mc", "mac" or null string in each of the above examples) alone, you will need to account for them in the replacement string. In order to be able to refer to them, you must have captured the matching text. Therefore to (say) put a "%" before the first letter (ignoring the "mc" or "mac" parts) then you would need a pattern of
\b(ma?c)?(\w)
and a replacement string of
$1%$2.
The pattern basically works the same way as before but this time we need to capture the "mc" or "mac" text if it is there so we can put it back into the replacement string. If there is anything matched, it will go into match group #1. The next letter will go into match group $2.
The replacement string is built up from whatever text was captured by match group #1, the "%" character and the text captured by match group #2. If the string was "wombat", then match group #1 will contain a null string which is not a problem in this case - the end result will simply be that the replacement string is "%w".Remembering that the replacement function will remove ALL matched characters (i.e. match group #0), then it will delete the "w" from "wombat" ands replace it with "%w" resulting in "%wombat"
If the original string was "mcdonald", then match group #1 will be "mc" so the replacement string is "mc%d". Again, the regex engine will remove all matched characters - the "mcd" in this case, and replace it with the built-up string to give "mc%donald".
Susan