Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

Last post 06-22-2009, 8:48 AM by prometheuzz. 11 replies.
Sort Posts: Previous Next
  •  11-17-2008, 2:43 PM 48395

    PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    In PHP using preg_match_all; I need to extract all anchor tags with array with href, title and innerHTML.

    <a href="http://mydomain.com/" title="My Title">My InnerHTML</a>
    <a href="http://mydomain.com/">My InnerHTML</a>
    <a href="http://mydomain.com/" title="My Title"><img scr="" /></a>

    Array
    (
    [0] => Array
    (
    [0] =>
    "http://mydomain.com/"
    [1] => "
    My Title"
    [2] => "My InnerHTML" 
    )
    [1] => Array
    (
    [0] => "http://mydomain.com/"
    [1] =>
    ""
    [2] => "My InnerHTML"
    )
    [2] => Array
    (
    [0] => "http://mydomain.com/"
    [1] =>
    "My Title"
    [2] => "<img scr="" />"
    )
    )

    I'm currently using "'<\s*a\s.*?href\s*=\s*([\"\'])?(?(1) (.*?)\\1 | ([^\s\>]+))'isx";
    my lack of Regex mindedness makes me believe I can just add ?title\s*=\s*([\"\'])
    but all efforts have failed; and so have searches for an expression that works.


    Jeff Flesher Retired USAF Disabled Gulf War Vet
    Filed under: , , , ,
  •  11-17-2008, 2:54 PM 48396 in reply to 48395

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    Give this a try:

    '#<a\s+href=[\'"]([^\'"]+)[\'"]\s*(?:title=[\'"]([^\'"]+)[\'"])?\s*>((?:(?!</a>).)*)</a>#i'

  •  11-17-2008, 5:04 PM 48406 in reply to 48396

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    Thanks; it took me awhile to figure out why it wasn't returning all my links till I noted ones with other attributes like  target="_blank"; also the img tag in the innerHTML isn't showing up also; how do I include them in this expression?

    Some things you might see in a link that this will have to work with
    <a id="myLink" class="myClass" href="http://mydomain.com/#bookmark" title="My Title" target="_blank">innerHTML may be an <img /></a>

    Sorry I didn't ask for this from the start. 

    Thanks so much for your help


    Jeff Flesher Retired USAF Disabled Gulf War Vet
  •  11-18-2008, 12:59 AM 48421 in reply to 48406

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    My original proposal can be used with a minor tweak:

    '#<a\s+.*?href=[\'"]([^\'"]+)[\'"]\s*(?:title=[\'"]([^\'"]+)[\'"])?.*?>((?:(?!</a>).)*)</a>#i'

  •  11-18-2008, 1:07 AM 48423 in reply to 48421

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    I can't thank you enough.

    I've been programming for over 30 years now; I have learning regexp on my bucket list.


    Jeff Flesher Retired USAF Disabled Gulf War Vet
  •  11-18-2008, 1:20 AM 48424 in reply to 48423

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    I'm going to drive you crazy with this one. It returns two Anchor tags in one array; my guess is because the anchor tag has no href; how do I make the href optional; seeing it's a section also?

    <a id="t" class="s" rel="m"><span id="s">Show</span></a>...<a href="/home.pdf" title="PDF" onclick="window.open('...'); return false;" rel="nofollow"><img src="/pdf.png" alt="PDF" /></a>

    On a side note I can't add the href in due to a quirk in some browsers; its an ugly hack I know; but I haven't found away around it.


    Jeff Flesher Retired USAF Disabled Gulf War Vet
  •  11-18-2008, 3:05 AM 48427 in reply to 48424

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    jflesher:

    I'm going to drive you crazy with this one. It returns two Anchor tags in one array; my guess is because the anchor tag has no href; how do I make the href optional; seeing it's a section also?

    ...

    No problem.

    Look at how I made the "title" attribute optional: I made a non-capturing group from it and placed a '?' after it to make it optional ('reluctant' in regex). The same approach can be done with href:

    Mandatory:

    href=[\'"]([^\'"]+)[\'"]


    Reluctant:

    (?:href=[\'"]([^\'"]+)[\'"])?
    Note that without the (?: ), the regex engine would have stored the entire href="..." in a group as well, something that you don't want, I presume.

    Post back if you have questions: I'd rather learn someone to fish than hand him a fish... and all that.

    ; )


     

  •  11-18-2008, 10:10 AM 48445 in reply to 48427

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    I was trying that last night in vain; I finally came up with this after sleeping on it.

    #<a\s*(?:href=[\'"]([^\'"]+)[\'"])?\s*(?:title=[\'"]([^\'"]+)[\'"])?.*?>((?:(?!</a>).)*)</a>#i

    It works great. In doing this last part be myself I didn't fulfill my bucket list but I did remember things I've learned over the years; having grown up using greps, this is much better and more powerful; and when you need one you really need it.

    This one should be added to the library; it is very useful. 

    You have been a huge help and I can't thank you enough. 


    Jeff Flesher Retired USAF Disabled Gulf War Vet
  •  11-18-2008, 10:17 AM 48447 in reply to 48445

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    jflesher:
    ...

    You have been a huge help and I can't thank you enough. 

    You're most welcome!

  •  11-18-2008, 11:53 AM 48454 in reply to 48447

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    I added this to http://regexlib.com; search for Anchor Tag

    http://regexlib.com/Search.aspx?k=Anchor+Tag&c=-1&m=-1&ps=20

    I gave you credit.

    Now if I can get them to add capability to test PHP functions; not that I see this happening on a .Net site; maybe we need a PHP counter part; do you know if they have one; or one they might link to for testing?

    Thanks again.


    Jeff Flesher Retired USAF Disabled Gulf War Vet
  •  06-20-2009, 5:44 AM 54149 in reply to 48454

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    Hi,

    what if title and href is reverse? And what if i want to extract three oder four different attributes- Do I need to add every possible order or is it possible to search for something without knowing the order?""

     examples:
    <a href="LINK" title="TITLE" target="_BLANK">TEXT</a>
    <a title="TITLE" href="LINK" target="_BLANK">TEXT</a>
    etc.

    I'd like to have something like that:
    '#<a(?: ([a-z]{2,}).*=.*["\'](.*)["\'])?>(.*)</a>#Usi'

    that results:
    array(
      0 => array(
        0 => array('href', 'title', 'target'),
        1 => array('title', 'href', 'target'),
      )
      1 => array(
        0 => array('LINK', 'TITLE', '_BLANK'),
        1 => array('TITLE', 'LINK', '_BLANK'),
      )
      2 => 'TEXT',
    )

    Or do I need to use two preg_match()'s to deal with it?


    Honda Forum
  •  06-22-2009, 8:48 AM 54179 in reply to 54149

    Re: PHP: preg_match_all on Anchor Tags with href, Title and InnerHTML

    > what if title and href is reverse? And what if i want to extract three oder four different attributes-

    Try this:

    $text = '<a href="LINK1" title="TITLE1" target="_BLANK1">TEXT1</a>
    ...some text ...
    <a title="TITLE2" href="LINK2" target="_BLANK2">TEXT2</a>
    some more text...
    <a title="TITLE3" href="LINK3">TEXT3</a>';

    preg_match_all(
      '#<a\s
        (?:(?= [^>]* href="   (?P<href>  [^"]*) ")|)
        (?:(?= [^>]* title="  (?P<title> [^"]*) ")|)
        (?:(?= [^>]* target=" (?P<target>[^"]*) ")|)
        [^>]*>
        (?P<text>[^<]*)
        </a>
      #xi',
      $text,
      $matches,
      PREG_SET_ORDER
    );

    foreach($matches as $match) {
      echo "entire match : " . $match[0]        . "\n";
      echo "    href     : " . $match['href']   . "\n";
      echo "    title    : " . $match['title']  . "\n";
      echo "    target   : " . $match['target'] . "\n";
      echo "    text     : " . $match['text']   . "\n";
    }

    /* output:

    entire match : <a href="LINK1" title="TITLE1" target="_BLANK1">TEXT1</a>
        href     : LINK1
        title    : TITLE1
        target   : _BLANK1
        text     : TEXT1
    entire match : <a title="TITLE2" href="LINK2" target="_BLANK2">TEXT2</a>
        href     : LINK2
        title    : TITLE2
        target   : _BLANK2
        text     : TEXT2
    entire match : <a title="TITLE3" href="LINK3">TEXT2</a>
        href     : LINK3
        title    : TITLE3
        target   :
        text     : TEXT3
       
    */

     

View as RSS news feed in XML