Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

how can i get content of a specified tag

Last post 01-06-2010, 11:30 PM by Aussie Susan. 2 replies.
Sort Posts: Previous Next
  •  12-25-2009, 4:55 AM 58076

    how can i get content of a specified tag

    Hi

    would you offer me a regular expression that can get the content of a special tag that it has another tags in it?

    for example i want to get content of body tag in an html document.

  •  12-27-2009, 10:53 AM 58095 in reply to 58076

    Re: how can i get content of a specified tag

    This should do it, at least with PCRE

    (?i)(<body.+</body>)

    Filed under:
  •  01-06-2010, 11:30 PM 58264 in reply to 58095

    Re: how can i get content of a specified tag

    A word of caution: this may work with the "<body>" tag as (as far as I know) there should only be one in an HTML page.

    Where this is used for tags that can be used multiple times then this will find the first opening tag and match it with the last closing tag in the line (or whole document if the "s" [or "singleline" or "dot matches newline"] option is also specified!!!!).

    The traditional solution is to stop the '+' quantifier being greedy (as this is the root cause of the above problem) by using ".+?". However this will match an opening tag with the next closing tag which may or may not be what you want, especially if the tags can be nested.

    There are techniques that are available for the .NET and PCRE regex variants that allow you to correctly match an opening tag with its corresponding closing tag but the depend on the special capabilities that each of the variants provides - each goes about this in a different way and so has their own extensions to the standard syntax or behaviour.

    Susan

View as RSS news feed in XML