I want to match a html tag extracting one of the attributes using PCRE in PHP but only if it has certain attributes. Example:
I want to extract the URL from an "a" tag but only if it has the class XY assigned.
possible matches:
<a href="http://example.com" class="XY">
<a class="XY" href="http://example.com">
<a class="XY" someotherattr="anything" href="http://example.com">
no match for:
<a href="http://example.com">
<a class="ABC" href="http://example.com">
<a someotherattr="anything" href="http://example.com">
Now one of my first tries was:
@<a[^>]+?((class="XY"){1}|(href=".*?"){1}|[^>]){2,}?>@i
Which is obviously wrong (well, not so obvious for me). Not only that it matches tags without the class XY but what really confused me trying it on
<a href="http://www.example.com/" id="whatever" class="XY">
gave me the result
Array
(
[0] => Array
(
[0] => <a href="http://www.example.com/" id="whatever" class="l">
)
[1] => Array
(
[0] => class="XY"
)
[2] => Array
(
[0] => class="XY"
)
[3] => Array
(
[0] => href="http://www.example.com/"
)
)
Now I'm not only puzzling what is the right expression to use but also why the heck class="XY" is appearing twice in the result of my wrong expression?