Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Nested elements

Last post 01-10-2010, 7:19 PM by Aussie Susan. 5 replies.
Sort Posts: Previous Next
  •  12-27-2009, 9:19 AM 58094

    Nested elements

    Using the PCRE engine.

    I've been months working at this one (on and off)  It's actually for a code indenter but I'll use html code for the example.

    <ul>

    <li>bla bla</li>

    <ul>

    <li>.....</li>

    <li>more</li>

    </ul>

    </ul>

    It's easy to get everything from the first <ul> to the first </ul> but that doesn't solve the issue.  I need it to go from the first <ul> to the last </ul>

    Now remember this is code blocks which could be something like nested Select statements or nested If/End If statements.

    I also have to do the same thing to write code that will convert a HTML Help contents file (*.hhc) to XML output for use in a treeview control so if anyone opens a hhc file in a text editor it will be easy to see what the issues are.

     

    Thanks for any and all input

     

    EDIT:  If you can figure it out using any engine then please post it.  I can handle the conversion to Perl Compatable.

  •  12-28-2009, 5:40 PM 58109 in reply to 58094

    Re: Nested elements

    PHP Code Example:
    <?php
    $sourcestring="your source string";
    preg_match_all('~<ul>.*</ul>~s',$sourcestring,$matches);
    echo "<pre>".print_r($matches,true);
    ?>


  •  12-30-2009, 1:37 PM 58135 in reply to 58109

    Re: Nested elements

    Thanks.  I'm working on it now and I'll let you know.
  •  01-03-2010, 10:20 PM 58176 in reply to 58135

    Re: Nested elements

    Just a comment - Doug's code will give you what you asked for: a match from the first "<ul>" to the last "</ul>".

    However, if you are trying to indent the code of each "<ul>...</ul>"  block, then this will only find the outer-most block as long as there is only one. For example, it will find all of

    <ul>
    xxxx
    </ul>
    yyy
    <ul>
    zzz
    </ul>

    If you really are writing a code indenter and you really are interested in nested elements (as your title suggests) then you need to find each "<ul>" and its matching  "</ul>" - a different problem altogether.

    If this is the case then PCRE does allow for this type of problem (using recursive operator) but the pattern is a bit more complicated.

    Susan

  •  01-09-2010, 12:46 AM 58328 in reply to 58176

    Re: Nested elements

    Thanks Susan.  You are correct in what I am trying to get.  I always have a problem with recursion.  getting from the first <ul> to the last </ul> is no problem at all.  It's for two different projects, one is a code indenter and the other is actaully a code converter.  I'm also only using <ul></ul as examples.  it could be any elements and not necessarily html.
  •  01-10-2010, 7:19 PM 58366 in reply to 58328

    Re: Nested elements

    I gather from your previous postings that you are familiar with PCRE and the pattern syntax. Therefore I suggest that you loook at the "Recursive Patterns" section in http://www.pcre.org/pcre.txt as it provides a description of the ?R (and ?1, ?2 etc) regex pattern extension that PCRE provides for this type of situation. The example uses "(" and ")" as the opening and closing items, but you should be able to extend this to using "<ul>" and "</ul>" (or indeed any other character sequences) based on the overall structure of the example pattern.

    Also, there are many other examples in this forum (and I'm sure on the Internet at large) of patterns that use the PCRE recursion feature.

    If you are still having problems, then let us know as we can help you further.

    Susan

View as RSS news feed in XML