Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

PHP scrape and regex problems ... how to exclude?

Last post 12-01-2008, 2:47 PM by Tomas Johansson. 18 replies.
Page 2 of 2 (19 items)   < Previous 1 2
Sort Posts: Previous Next
  •  11-30-2008, 6:06 AM 48992 in reply to 48981

    Re: PHP scrape and regex problems ... how to exclude?

    ddrudik ... my hero, it works like a charm. You've saved med hours and hours of "trial an errror" ... 

     

    I made a version of your code and my old to collect all the data in the three first TD's and put them all into one single Item, like this. 

    <?php 
    $text 
    file_get_contents('http://landris.hh.se/4DACTION/WebShowRoll/1-21?offset=4320&update=0&rows=0&page=0&branch=4&group=-21&start=yes&stop=yes&order=ascending&web_cols=1&web_numChars=-'
    ); 
    $regex 
    "# 
        <tr>\s* 
        <td\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</td>\s* 
        <td\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</td>\s* 
        <td\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</td> 
    #isx"

    if(
    preg_match_all($regex$text$matches
    )) { 

    foreach ($matches[0] as $title) {

        $title = strip_tags($title);
        $title = str_replace('&nbsp;', '', $title);
        $title = trim($title); 

                echo "<item>\n\t<title>" strip_tags($title) "</title>\n\t<description>" strip_tags($title) "</description>\n\t<guid>http://www.campus.varberg.se/".md5(uniqid()).".html</guid>\n</item>\n"
            } 
        } 
     
    ?> 

  •  11-30-2008, 7:14 AM 48993 in reply to 48981

    Re: PHP scrape and regex problems ... how to exclude?

    Another question ... is it possible to add an custom message if no $matches where found. 

     

    Nevermind ... solved that by adding:

     else { 

       echo "<item>\n\t<title>" . strip_tags($title) . "DET FINNS INGA LEKTIONER SCHEMALAGDA.</title>\n\t<description>" . strip_tags($title) . "DET FINNS INGA LEKTIONER SCHEMALAGDA.</description>\n\t<guid>http://www.campus.varberg.se/".md5(uniqid()).".html</guid>\n</item>\n"; 

     

     

    Thanks again for all your help 

  •  11-30-2008, 8:25 AM 48996 in reply to 48993

    Re: PHP scrape and regex problems ... how to exclude?

    Consider this code which includes your last 2 requirements:

    <?php
    header
    ("Content-Type: text/xml; charset=ISO-8859-1"
    );
    echo 
    "<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?>\n"
    ;
    ?>
    <rss version="2.0"
      xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:content="http://purl.org/rss/1.0/modules/content/"
      xmlns:admin="http://webns.net/mvcb/"
      xmlns:atom="http://www.w3.org/2005/Atom"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel>
    <title>Lektioner:</title>
    <description>Campus Varberg</description>
    <link>http://landris.hh.se</link>
    <atom:link href="http://www.campus.varberg.se/dev/scrape/schema.php" rel="self" type="application/rss+xml" />
    <language>se-SE</language>
    <?php
    $text 
    file_get_contents('http://landris.hh.se/4DACTION/WebShowRoll/1-21?offset=4320&update=0&rows=0&page=0&branch=4&group=-21&start=yes&stop=yes&order=ascending&web_cols=1&web_numChars=-'
    );
    $regex 
    "#
        <tr>\s*
        <td\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</td>\s*
        <td\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</td>\s*
        <td\s+style=[^>]+>\s*<font\s*class='a_text'>\s*([^>]*)</font>\s*</td>
    #isx"
    ;
    if(
    preg_match_all($regex$text$matches
    )) {
        for (
    $i 0$i count($matches[0]); $i
    ++) {
            
    $title=$matches[1][$i].' '.$matches[2][$i].' '.$matches[3][$i
    ];
            echo 
    "<item>\n\t<title>$title</title>\n\t<description>$title</description>\n\t<guid>http://www.campus.varberg.se/".md5(uniqid()).".html</guid>\n</item>\n"
    ;
        }
    } else {  
       echo 
    "<item>\n\t<title>DET FINNS INGA LEKTIONER SCHEMALAGDA.</title>\n\t<description>DET FINNS INGA LEKTIONER SCHEMALAGDA.</description>\n\t<guid>http://www.campus.varberg.se/".md5(uniqid()).".html</guid>\n</item>\n"
    ;   
    }   
    ?>
    </channel>
    </rss> 


  •  12-01-2008, 2:47 PM 49035 in reply to 48996

    Re: PHP scrape and regex problems ... how to exclude?

    Thanks a lot 

Page 2 of 2 (19 items)   < Previous 1 2
View as RSS news feed in XML