Not having any luck with that expression either. I think the crux of my problem is that because my html file is being divided into chunks, and I am only able to access 1 chunk at a time, it is possible that the beginning or the end of the current chunk may not contain a complete opening and closing tag. The opening tag may be in the previous chunk, while the closing tag is in the current chunk. The way I see it, there are 3 possibilities that I need to account for, and I'm not sure I can do it in a single RegEx.
1. match all text between > and < anywhere in the chunk
2. match all text before a < if there is no preceding > in the chunk (chunk="abc abc abc</tag><tag>def def def</tag>", match would be "abc abc abc<")
3. match all text after a > if there is no succeeding < in the chunk (chunk="<tag>abc abc abc</tag><tag>def def def", match would be ">def def def")
I don't mind having to use 3 different expressions, but I'm not sure what RegEx would give me 2 and 3 above. I can't quite wrap my head around the look behind/ahead functionality to give me what I'm looking for.