Got more questions? Find advice on: ASP | SQL | XML | Windows
Welcome to RegexAdvice Sign in | Join | Help

html pattern matching

  •  04-15-2007, 2:42 PM

    html pattern matching

    My script basically parses a game website for information about players in clans.  Although not pretty, my pattern generally works, but sometimes there is an inconsistency in the code I am matching against.

    $pattern = '|<span[^>]*><li>\s*([^:]+):\s*</span><a[^>]*>([a-zA-Z]+)\s*\((.+?)\)\s*"([^"]*)"</a><img\s+.*?alt\s*=\s*"([^"]*)"[^>]*>(&nbsp;)*|si';

    The following is a sample of what I am matching, and which parts I am trying to pull.  In line 4, there is no match4, so it skips the next line.  This causes that match row to be half of line 4 and the other half line 5.  I would either like to to pull match4, or null, but I haven't found anything that can help me do this.  Any help would be greatly appreciated.

    <span class="big"><LI> match1: </span><a class="link" href="http://users.nexustk.com/?name=jelia" target="_new">match2 (match3) "match4"</a><IMG SRC="buttongreen.gif" WIDTH="13" HEIGHT="13" ALT="match5">match6<BR>
    <span class="big"><LI> Kindred: </span><a class="link" href="http://users.nexustk.com/?name=jelia" target="_new">Jelia (Swift - Ee San) "Paladin"</a><IMG SRC="buttongreen.gif" WIDTH="13" HEIGHT="13" ALT="Active">&nbsp;<BR>
    <span class="big"><LI> Kindred: </span><a class="link" href="http://users.nexustk.com/?name=kailee" target="_new">KaiLee (Chung Ryong - Level 99) "Ascendant"</a><IMG SRC="buttongreen.gif" WIDTH="13" HEIGHT="13" ALT="Active">&nbsp;<BR>
    <span class="big"><LI> Kindred: </span><a class="link" href="http://users.nexustk.com/?name=kaufman" target="_new">Kaufman (Rogue - Level 66) </a><IMG SRC="buttonyellow.gif" WIDTH="13" HEIGHT="13" ALT="Inactive"><IMG SRC="Notreg.gif" WIDTH="15" HEIGHT="12" ALT="Unregistered"><BR>
    <span class="big"><LI> Kindred: </span><a class="link" href="http://users.nexustk.com/?name=kawakami" target="_new">Kawakami (Baekho - Level 99) "Apprentice"</a><IMG SRC="buttonred.gif" WIDTH="13" HEIGHT="13" ALT="Absent"><IMG SRC="Notreg.gif" WIDTH="15" HEIGHT="12" ALT="Unregistered"><BR>

     

View Complete Thread