Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Wayne's Regex Rants

LookAhead LookBehind Subtleties

Today I came across an example that nicely illustrates the use of a zero-width look-behind assertion. A zero-width assertion is an expression that may match text but does not consume any characters from the input string.

I wanted to extract an entire <img> tag from a chunk of text, while also verifying that the tag was self-closed. A starting pattern might look like this:

<img [^/]* />
That will work except for the occasional img tag that includes an embedded forward-slash.

One way to correct for that while keeping the pattern nice and simple is to use a zero-width assertion. Applying a look-ahead assertion would produce:

<img [^>]* (?=/)>
But that doesn't quite work. When applied to the sample text:
<img src='foo' />
the character class will consume characters up-to and including the forward-slash. Then the assertion is applied and 'looks ahead' for a forward-slash, but it finds only the greater-than, and so fails.

The solution then, is to look backward to find the forward-slash:

<img [^>]* (?<=/)>

Note: The regex patterns above include extra spaces in them to make them more readable; remove the spaces or apply the 'IgnorePatternWhitespace' option when using them.

-Wayne


Taking another look at this, I see I got carried away with using the 'cool' look-behind functionality. For the specific problem I was trying to solve — that is, extract an entire <img> tag from a chunk of text while also verifying that the tag is self-closed — use of the look-behind feature is overkill. This 'traditional' regex will do that:

<img [^>]* />

Published Monday, April 12, 2004 7:13 PM by wayneking

Comments

 

wayneking said:

Nice post Wayne, thanks for the update too :-)

I actually used this pattern last night when I needed to match IMG tags - and I didn't pick up that the lookbehind was overkill either.
April 19, 2004 4:10 PM
 

wayneking said:

Wayne, here's an article which, while not directly linked this pattern does mention it as the source of much frustration for Justin and I :-)

http://weblogs.asp.net/Justin_Rogers/archive/2004/08/04/208473.aspx

I thought that you might enjoy reading it!

Hope you are well.
August 4, 2004 9:21 PM
 

TrackBack said:

August 3, 2004 8:36 AM
Anonymous comments are disabled