Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Matching HTML tag with certain attributes as a condition

Last post 10-09-2008, 4:26 AM by prometheuzz. 3 replies.
Sort Posts: Previous Next
  •  10-08-2008, 1:05 PM 47016

    Matching HTML tag with certain attributes as a condition

    I want to match a html tag extracting one of the attributes using PCRE in PHP but only if it has certain attributes. Example:

    I want to extract the URL from an "a" tag but only if it has the class XY assigned.

    possible matches:

    <a href="http://example.com" class="XY">

    <a class="XY" href="http://example.com">

    <a class="XY" someotherattr="anything" href="http://example.com">

    no match for:

    <a href="http://example.com">

    <a class="ABC" href="http://example.com">

    <a someotherattr="anything" href="http://example.com">

    Now one of my first tries was:

    @<a[^>]+?((class="XY"){1}|(href=".*?"){1}|[^>]){2,}?>@i

    Which is obviously wrong (well, not so obvious for me). Not only that it matches tags without the class XY but what really confused me trying it on

     

    <a href="http://www.example.com/" id="whatever" class="XY">

     

    gave me the result

     
    Array
    (
    [0] => Array
    (
    [0] => <a href="http://www.example.com/" id="whatever" class="l">
    )

    [1] => Array
    (
    [0] => class="XY"
    )

    [2] => Array
    (
    [0] => class="XY"
    )

    [3] => Array
    (
    [0] => href="http://www.example.com/"
    )

    )
     
    Now I'm not only puzzling what is the right expression to use but also why the heck class="XY" is appearing twice in the result of my wrong expression?
  •  10-08-2008, 2:03 PM 47020 in reply to 47016

    Re: Matching HTML tag with certain attributes as a condition

    Give this a try:

    <a\s+(?=[^>]*class="XY")[^>]*href="([^"]*)"[^>]*>

    (untested!)

    and don't hesitate to ask for clarification: I am more than happy tpo provide it if need be!

  •  10-08-2008, 5:19 PM 47026 in reply to 47020

    Re: Matching HTML tag with certain attributes as a condition

    I did not test that yet because it's too late for me to work on regex ;) but this looks like it won't match if class XY comes after href?

    Besides that I'm still confused why my previous expression matched twice class XY?

  •  10-09-2008, 4:26 AM 47038 in reply to 47026

    Re: Matching HTML tag with certain attributes as a condition

    schnizZzla:

    I did not test that yet because it's too late for me to work on regex ;) but this looks like it won't match if class XY comes after href?

    No, with "positive look aheads" (what you have been calling assertions), the order doesn't matter if you use multiples of them after each other.

    The regex:

    a(?=[^b]*b)(?=[^c]*c)

    will match the 'a' in both these strings:

    "a   b   c"
    "a   c   b"
     

    schnizZzla:

    Besides that I'm still confused why my previous expression matched twice class XY?

    Because with your regex, you're creating three capturing groups:

    <a[^>]+?((class="XY"){1}|(href=".*?"){1}|[^>]){2,}?>
            ^^          ^    ^          ^        ^
            ||          |    |          |        |
            |+----{2}---+    +----{3}---+        |
            |                                    |
            +-----------------{1}----------------+

    and since there is a {2,} after the first group, the last match is stored in it. So, wtih the string

    <a href="http://www.example.com/" id="whatever" class="XY">

    group {1} will hold class="XY" because it's the last match captured by (...){2,} and group {2} holds class="XY" as well. Group 3 will hold the href=...

    Note the you can give up the first capturing group by adding ?: right after it (then it's called a non-capturing group) and also note that {1} in your regex can be left out: it is redundant. Getting the final regex:

    <a[^>]+?(?:(class="XY")|(href=".*?")|[^>]){2,}?>

    But, I don't mean to sound too vain, I like my suggestion better.

    ; )

     

View as RSS news feed in XML