Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Problem isolating mutiple blocks in HTML

Last post 05-13-2008, 8:43 PM by Aussie Susan. 3 replies.
Sort Posts: Previous Next
  •  05-13-2008, 7:39 AM 42174

    Problem isolating mutiple blocks in HTML

    I'm currently using preg_replace_callback in PHP to parse html templates and replace place holders in the templates with content from the database.

    All is good and working fine except where I have mutiple place holders that are the same. so my template looks like

    <p>some text</p>
    <h1>Resources 1</h1>
      {cms type="resource" id="1"}
        <a href="{link}">{title}</a>
      {/cms}
    <p>some more text</p>
    <h1>Resources 2</h1>
       {cms type="resource" id="2"}
        <a href="{link}">{title}</a>
      {/cms}
    <p>lots more text</p> 

    and the reg ex i'm using is:

    '/{cms type="resource" id="([\d]*)"}([\s\S]*){\/cms}/'

    But this identifies the first {cms type="resource" id="1"} block as the start of the block and the last {/cms}  as the end of the block returning everything in between:

    <a href="{link}">{title}</a>
    {/cms}
    <p>some more text</p>
    <h1>Resources 2</h1>
    {cms type="resource" id="2"}
    <a href="{link}">{title}</a>

    Can you suggest how I can get to identify the first {/cms} s the end of the block and only return what is in a single block.

    Any help greatfully appreciated.

    Cheers

    Mike
     

     

     

     

     

     

     


     

  •  05-13-2008, 12:26 PM 42189 in reply to 42174

    Re: Problem isolating mutiple blocks in HTML

    I believe adding a ? solves the problem but I'm not sure why:

    /{cms type="resource" id="([\d]*)"}([\s\S]*?){\/cms}/

    what does the ? in  ([\s\S]*?) do?

     

     

  •  05-13-2008, 5:05 PM 42212 in reply to 42189

    Re: Problem isolating mutiple blocks in HTML

    Used in this manner, the ? indicates a non-greedy matching.

    A typical greedy match matches as much as possible.

    So this regex: .*b

    applied against this text:

    aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab

    matches on the whole string:

    aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab

    But: .*?b matches as little as possible, so it matches on:

    aaaaaaaaaaaaaaaaaaaaaaaaaab

    stopping as soon as the match can be completed.

     

    Your ([\s\S]*) regex matches on all characters, so the ? modifies the regex to look for the shortest number of characters that allows the regex to complete the match (i.e. end with your closing tag)

  •  05-13-2008, 8:43 PM 42221 in reply to 42174

    Re: Problem isolating mutiple blocks in HTML

    Mike,

    Also make sure that you don't have nested tags in the text - if you do then other techniques will be needed (as in using the 'recursive' operators).

    Susan

     

View as RSS news feed in XML