Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

Last post 11-18-2008, 11:53 AM by jflesher. 9 replies.
Sort Posts: Previous Next
  •  11-17-2008, 2:43 PM 48395

    PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    In PHP using preg_match_all; I need to extract all anchor tags with array with href, title and innerHTML.

    <a href="http://mydomain.com/" title="My Title">My InnerHTML</a>
    <a href="http://mydomain.com/">My InnerHTML</a>
    <a href="http://mydomain.com/" title="My Title"><img scr="" /></a>

    Array
    (
    [0] => Array
    (
    [0] =>
    "http://mydomain.com/"
    [1] => "
    My Title"
    [2] => "My InnerHTML" 
    )
    [1] => Array
    (
    [0] => "http://mydomain.com/"
    [1] =>
    ""
    [2] => "My InnerHTML"
    )
    [2] => Array
    (
    [0] => "http://mydomain.com/"
    [1] =>
    "My Title"
    [2] => "<img scr="" />"
    )
    )

    I'm currently using "'<\s*a\s.*?href\s*=\s*([\"\'])?(?(1) (.*?)\\1 | ([^\s\>]+))'isx";
    my lack of Regex mindedness makes me believe I can just add ?title\s*=\s*([\"\'])
    but all efforts have failed; and so have searches for an expression that works.


    Jeff Flesher Retired USAF Disabled Gulf War Vet
    Filed under: , , , ,
  •  11-17-2008, 2:54 PM 48396 in reply to 48395

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    Give this a try:

    '#<a\s+href=[\'"]([^\'"]+)[\'"]\s*(?:title=[\'"]([^\'"]+)[\'"])?\s*>((?:(?!</a>).)*)</a>#i'

  •  11-17-2008, 5:04 PM 48406 in reply to 48396

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    Thanks; it took me awhile to figure out why it wasn't returning all my links till I noted ones with other attributes like  target="_blank"; also the img tag in the innerHTML isn't showing up also; how do I include them in this expression?

    Some things you might see in a link that this will have to work with
    <a id="myLink" class="myClass" href="http://mydomain.com/#bookmark" title="My Title" target="_blank">innerHTML may be an <img /></a>

    Sorry I didn't ask for this from the start. 

    Thanks so much for your help


    Jeff Flesher Retired USAF Disabled Gulf War Vet
  •  11-18-2008, 12:59 AM 48421 in reply to 48406

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    My original proposal can be used with a minor tweak:

    '#<a\s+.*?href=[\'"]([^\'"]+)[\'"]\s*(?:title=[\'"]([^\'"]+)[\'"])?.*?>((?:(?!</a>).)*)</a>#i'

  •  11-18-2008, 1:07 AM 48423 in reply to 48421

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    I can't thank you enough.

    I've been programming for over 30 years now; I have learning regexp on my bucket list.


    Jeff Flesher Retired USAF Disabled Gulf War Vet
  •  11-18-2008, 1:20 AM 48424 in reply to 48423

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    I'm going to drive you crazy with this one. It returns two Anchor tags in one array; my guess is because the anchor tag has no href; how do I make the href optional; seeing it's a section also?

    <a id="t" class="s" rel="m"><span id="s">Show</span></a>...<a href="/home.pdf" title="PDF" onclick="window.open('...'); return false;" rel="nofollow"><img src="/pdf.png" alt="PDF" /></a>

    On a side note I can't add the href in due to a quirk in some browsers; its an ugly hack I know; but I haven't found away around it.


    Jeff Flesher Retired USAF Disabled Gulf War Vet
  •  11-18-2008, 3:05 AM 48427 in reply to 48424

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    jflesher:

    I'm going to drive you crazy with this one. It returns two Anchor tags in one array; my guess is because the anchor tag has no href; how do I make the href optional; seeing it's a section also?

    ...

    No problem.

    Look at how I made the "title" attribute optional: I made a non-capturing group from it and placed a '?' after it to make it optional ('reluctant' in regex). The same approach can be done with href:

    Mandatory:

    href=[\'"]([^\'"]+)[\'"]


    Reluctant:

    (?:href=[\'"]([^\'"]+)[\'"])?
    Note that without the (?: ), the regex engine would have stored the entire href="..." in a group as well, something that you don't want, I presume.

    Post back if you have questions: I'd rather learn someone to fish than hand him a fish... and all that.

    ; )


     

  •  11-18-2008, 10:10 AM 48445 in reply to 48427

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    I was trying that last night in vain; I finally came up with this after sleeping on it.

    #<a\s*(?:href=[\'"]([^\'"]+)[\'"])?\s*(?:title=[\'"]([^\'"]+)[\'"])?.*?>((?:(?!</a>).)*)</a>#i

    It works great. In doing this last part be myself I didn't fulfill my bucket list but I did remember things I've learned over the years; having grown up using greps, this is much better and more powerful; and when you need one you really need it.

    This one should be added to the library; it is very useful. 

    You have been a huge help and I can't thank you enough. 


    Jeff Flesher Retired USAF Disabled Gulf War Vet
  •  11-18-2008, 10:17 AM 48447 in reply to 48445

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    jflesher:
    ...

    You have been a huge help and I can't thank you enough. 

    You're most welcome!

  •  11-18-2008, 11:53 AM 48454 in reply to 48447

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    I added this to http://regexlib.com; search for Anchor Tag

    http://regexlib.com/Search.aspx?k=Anchor+Tag&c=-1&m=-1&ps=20

    I gave you credit.

    Now if I can get them to add capability to test PHP functions; not that I see this happening on a .Net site; maybe we need a PHP counter part; do you know if they have one; or one they might link to for testing?

    Thanks again.


    Jeff Flesher Retired USAF Disabled Gulf War Vet
View as RSS news feed in XML