Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

get content between html tags

Last post 07-05-2009, 7:07 PM by Aussie Susan. 3 replies.
Sort Posts: Previous Next
  •  07-02-2009, 5:05 PM 54521

    get content between html tags

    hello,

    i would like to help me in a situation....

     i want to match the text between html tags

     start tag = <ul class="Results">

    end tag = </ul>

    text = any character, also it has other html tags, in general ALL characters.....

     

    any help would be appreciated....

  •  07-02-2009, 7:34 PM 54525 in reply to 54521

    Re: get content between html tags

    Normally we have a sticky note at the beginning of this forum that provides the posting guidelines, but I note that it is not there at the moment. Can you please tell us the platform you are using and the regex variant (which library, program etc.). Can you also provide us with a small sample of the text you are using.

    I would also suggest that you search through the previous postings in this forum as the question has been asked and answered several times already in various forms.

    However, the best answer I can provide is to use the HTML DOM as this is exactly the problem the DOM is suited to. It will also overcome the issues you will have with a regex failing on (even slightly) malformed HTML code.

    Also, can the "<ul...</ul>" tags be nested? If so then you might be in a situation where only a few regex variants will be able to cope but the HTML DOM will always come through.

    Susan 

  •  07-05-2009, 4:31 AM 54652 in reply to 54525

    Re: get content between html tags

    Some of the text is the following
    <ul class="Result">

            <li class="Image"><a href="..."></a></li>

    ....

    </ul>

    In general the text betwwen the tags consists of li tags and a tags and a lots spaces before and after them.

    i am using PHP 5.2.6 with the XAMPP platform and the functions preg_match for the matching.

    All i want to do is to get this text between tags, all of it, with spaces and tags.

    i want to know the regex characters you will write to envelop this text between the <ul> tags i commented earlier.

    Thnk you again and sorry for not following the rules...

  •  07-05-2009, 7:07 PM 54688 in reply to 54652

    Re: get content between html tags

    Please don't take my comments as a rebuke for not following the rules - at the time you wrote your post the rules were not there to be followed.

    However you didn't answer my question as to whether the tags can be nested. Therefore the following assumes that they cannot be as this greatly simplifies the pattern.

    The simplest (and possibly fragile) pattern is:

    <ul\b[^>]*>(.*?)</ul>

    where you will need to have the 'singleline' (and possibly the 'ignore case') option(s) set.

    A somewhat more robust pattern is:

    <ul\b[^>]*>(((?!</?ul).)*)</ul>

    with the same options set. The 2nd pattern will not pick up badly formed tags whereas the first one will match incorrectly.

    In both cases, the text between the tags will be in match group #1. 

    If the tags can be nested, then let us know and we can show you the full (but very complex) pattern that will handle this (at least you are using PHP's PCRE regex library which has the required extensions to its capabilities).

    However, I would strongly recommend that you look at the HTML DOM for this as it will save you lots of time and frustration both now and when your code needs to be maintained in the future.

    Susan 

View as RSS news feed in XML