I need a regular expression to get the Event, Name, School, Final Swim Time, and Swim Threshold (The DIIA) from a Results page like the one at ( http://www.gliac.org/sports/mswimdive/2010-11/stats/Results_Wed_Finals.htm ). Note that the results are sepereated from the rest of the page by the "<pre>" html tag.
Each "line" looks like this:
1 Donahue, Maura 19 INDY 10:39.77 10:03.60 DIIA
Unfortunately, I'm not sure exactly how to do so. One of the problems (in my mind!) is that sometimes it displays the swimmers age (19) and other times it doesn't. In addition, sometimes results show their seed time (10:39.77) and other times it only has the final time (10:03.60).
I started the regex by trying to split up to the "," in the first name, but failed miserably.
I'm using simple_html to extract the contents of the HTML page.
My code looks like this (I'm using PHP):
$results_url = "http://www.gliac.org/sports/mswimdive/2010-11/stats/Results_Wed_Finals.htm";
// Create a DOM object from a URL
$html = file_get_html($results_url);
if (!$html->find('pre')) {
$parse_error = "Yes";
}
if (!isset($parse_error)) {
$regex = "/[0-9]+(?=[ \s]+)(?=[A-Za-z]+)/";
$splits = preg_split($regex, $html, PREG_SPLIT_DELIM_CAPTURE);
print_r($splits);
}
If you can help out or point me in the right direction, that would be awesome! Is it even possible to run a regex against the results to extract this data?
Thank you!