Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Text Between Tags

Last post 06-30-2009, 4:10 PM by mihixson. 3 replies.
Sort Posts: Previous Next
  •  06-30-2009, 9:27 AM 54442

    Text Between Tags

     I am currently using the following RegEx to match all text between any 2 tags:

    >\s*[^<]*<(?!/script)

    "<tag>blah blah blah</tag>" will match ">blah blah blah<" which is what I want. However, I am dividing a larger html file into smaller chunks and matching on these smaller chunks.  These chunks are arbitrary so I have no control over where they begin and end.  Thus I run into a problem with the above RegEx not matching content at the beginning of a chunk if the opening tag is not in the current chunk, and also at the end of the chunk if the closing tag is not there.  Basically, if my current chunk is "<tag>blah blah bl" I would like it to match ">blah blah bl", along with "ah blah</tag>" matching "ah blah<".  Any ideas?

  •  06-30-2009, 10:24 AM 54447 in reply to 54442

    Re: Text Between Tags

    limo(?=([^<>]*<[^>]+>[^<>]*)+$|[^<>]*$)(?=((?!<body).)*</body>)

    Would search for the word "limo" outside of < > tags but within <body> </body> block.

    Solution from a previous question.  See if you can modify that for your needs.


  •  06-30-2009, 3:41 PM 54454 in reply to 54447

    Re: Text Between Tags

    Not having any luck with that expression either.  I think the crux of my problem is that because my html file is being divided into chunks, and I am only able to access 1 chunk at a time, it is possible that the beginning or the end of the current chunk may not contain a complete opening and closing tag.  The opening tag may be in the previous chunk, while the closing tag is in the current chunk.  The way I see it, there are 3 possibilities that I need to account for, and I'm not sure I can do it in a single RegEx.

    1.  match all text between > and < anywhere in the chunk
    2.  match all text before a < if there is no preceding > in the chunk (chunk="abc abc abc</tag><tag>def def def</tag>", match would be "abc abc abc<")
    3.  match all text after a > if there is no succeeding < in the chunk (chunk="<tag>abc abc abc</tag><tag>def def def", match would be ">def def def")

     I don't mind having to use 3 different expressions, but I'm not sure what RegEx would give me 2 and 3 above.  I can't quite wrap my head around the look behind/ahead functionality to give me what I'm looking for.

  •  06-30-2009, 4:10 PM 54455 in reply to 54454

    Re: Text Between Tags

    Apparently it was easier than I was making it out to be.  After a little more playing around, the following expression seems to work:

    ^[^>]*<|>\s*[^<]*<|>[^<]*$

View as RSS news feed in XML