Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Replace word, but not in img tag

Last post 08-01-2010, 7:25 PM by Aussie Susan. 1 replies.
Sort Posts: Previous Next
  •  07-30-2010, 10:27 AM 70280

    Replace word, but not in img tag

    Hi, bit of a RegEx newbie here, so not sure if this is even possible, but I need to wrap certain words in my HTML (using preg_replace in php) with some tags. I've managed to do this, but it's causing errors in img tags as it is replacing the words in the alt section of the img, so far i have the following

    $regex = ('#(\b[^>])('.preg_quote($glossItem).')(\b)#ei');
    $string = preg_replace($regex, "(' <a class=\"tt2\" rel=\"tt2$i\">$2</a>')", $string);

    with $glossItem being the word i'm looking for and $string being a big ol' lump of html.

    Is there a way of excluding the words in an img tag using a regular expression?

  •  08-01-2010, 7:25 PM 70344 in reply to 70280

    Re: Replace word, but not in img tag

    It should be possible - in fact there are many questions in this forum that are (almost) exactly the same as yours.

    I say "almost" because I'm not entirely sure what you are trying to do. In the posting guidelines in the sticky note at the beginning of this forum we ask that you supply us with some real examples of the text you are trying to process. In this case it would really help to see that $string and $glossItem look like as an example. Also, as you say that your pattern works in some cases but not others, can you please give us examples of both so we can test any pattern against both.

    I'm not sure what you are trying to achieve with some parts of your pattern. For example '\b' is what is known as a "zero-width anchor" in that it checks to see if there is a word character on one side and a non-word character on the other (effectively looking for either the start or end of a word) but it does not actually match any character. Therefore the '(\b)' match group at the end of your pattern will always match nothing.

    I'm also trying to understand the '(\v[^>])' part of your pattern. This will match either the first character or a word (as in the "H" in " Hello"- you need to know the non-word character that comes before the matched letter in this case) or the first non-word character at the end of a word as long as it is not followed by a">" (as in the "o" in "Hello;" but not in "Hello>"). As this is followed by the word you are looking for, I assume it is looking for the latter (otherwise you would incorrectly match, say "know" when the $glossItem value is "now") but I suspect that this  may be ambiguous.

    Also, you will be deleting the character matched by that first part as I can't see any reference to it in the replacement string.

    Finally, in most cases where you are trying to extract and/or modify information from HTML or XML files,it is better to use the HTML or XML DOM as it is designed for exactly this type of work - much simpler and more reliable in almost every case.

    Susan

View as RSS news feed in XML