Have to confess that 'm quite a newbie at Regex and PHP as well ... more of an open-source-leech :) Normaly I manage to adjust the code so that it suits my needs, but this time I'm up against the wall ;) Thanks for all your help so far ...
This is how I implemented your last code-snippet, but it only gives me the data from the first TD not the second and third. The first regex you posted kind of worked (it stripped away the fourth TD, but instead it gave me a lot of empty 'items' in my feed ... probably the ones that where stripped away:
<?php
// Get page
$url = "http://landris.hh.se/4DACTION/WebShowRoll/1-21?offset=4320&update=0&rows=0&page=0&branch=4&group=-21&start=yes&stop=yes&order=ascending&web_cols=1&web_numChars=-";
$data = implode("", file($url));
// Get content items
preg_match_all ("#<TR>\s*<TD\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</TD>\s*<TD\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</TD>\s*<TD\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</TD>#isx', $data, $matches);
// Begin feed
header ("Content-Type: text/xml; charset=ISO-8859-1");
echo "<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?>\n";
?>
<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:admin="http://webns.net/mvcb/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<channel>
<title>Lektioner:</title>
<description>Campus Varberg</description>
<link>http://landris.hh.se</link>
<atom:link href="http://www.campus.varberg.se/dev/scrape/schema.php" rel="self" type="application/rss+xml" />
<language>se-SE</language>
<?
// Loop through each content item
foreach ($matches[0] as $match) {
// First, get title
preg_match ("#<TR>\s*<TD\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</TD>\s*<TD\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</TD>\s*<TD\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</TD>#isx', $match, $temp);
$title = $temp['1'];
$title = strip_tags($title);
$title = str_replace(' ', '', $title);
$title = trim($title);
// Third, get text
preg_match ("/<font class=\'a_text\'>([^`]*?)<\/font>/", $match, $temp);
$text = $temp['1'];
$text = strip_tags($text);
$text = trim($text);
$text = str_replace(' ', '', $text);
$token = md5 (uniqid ());
// Echo RSS XML
echo "<item>\n";
echo "\t\t\t<title>" . strip_tags($title) . "</title>\n";
echo "\t\t\t<description>" . strip_tags($text) . "</description>\n";
echo "\t\t\t<content:encoded><![CDATA[ \n";
echo $text . "\n";
echo " ]]></content:encoded>\n";
echo "<guid>http://www.campus.varberg.se/$token.html</guid>\n";
echo "\t\t</item>\n";
}
?>
</channel>
</rss>