Hi ddrudik,
I apologize for that. Let me elaborate further on my original post.
We have a site where we allow users to submit content. This content is in html (we provide a wysiwyg) and many people copy and paste from MS Word.
So, I first go through the process of cleaning up what they submit so that what I have left is the "cleansed" html (I use tidy and some other logic to clean up the source). By the time the cleaning process is complete, I'm left with a "workable" set of html.
I then need to split the content up into multiple pages. We define a page as having N number of words. So, this means that I have to be able to split apart the html, loop through the words until I hit N, end that page (by closing up the markup with the appropriate closing tags, etc), then start a new page and make sure that the starting tags are a continuation of the tags that the content was in when I split it.
Here is an example: http://fanfictiondev.myfandoms.com/index.php/fanfiction/show/317?site_id=13
But the problem with that example is prevalent even on the first page. You'll notice that the paragraph, " Excellent. I’m so relieved to hear it wasn’t a date that I was suddenly myself and instead of trying to figure out what he hell was happening, I acted like myself. I kissed her good night. You know…straight girl kissing good night. Not that I ever thought of it that way before, but I guess that’s ....." should actually have only the word Excellent. as being italicized.
A further example of what I'm needing to do:
CONTENT:
<p>What was supposed to happen next? What the hell was I supposed to do next?</p>
<blockquote>“I’m actually meeting my brother in Brooklyn for dinner today.”</blockquote>
<p><span style='font-weight:bold;'><em>That is excellent</em></span>. I’m so relieved to hear it wasn’t a date that I was suddenly myself and instead of trying to figure out what he hell was happening.</p>
<p>But then things changed. <strong>Big time</strong>. I found myself not in the usual “peck and run” mode.</p>
<p style='color:blue;'>And she did</p>
RESULTS OF PAGE SPLITTING IF WE USE EVERY 3 WORDS:
PAGE 1:
<p>What was supposed</p>
PAGE 2:
<p>to happen next?</p>
PAGE 3:
<p>What the hell</p>
PAGE 4:
<p>was I supposed</p>
PAGE 5:
<p>to do next?</p>
PAGE 6:
<blockquote>“I’m actually meeting</blockquote>
PAGE 7:
<blockquote>my brother in</blockquote>
PAGE 8:
<blockquote>Brooklyn for dinner</blockquote>
PAGE 9:
<blockquote>today.”</blockquote>
<p><span style='font-weight:bold;'><em>That is</em></span></p>
PAGE 10:
<p><span style='font-weight:bold;'><em>excellent</em></span>. I’m so</p>
PAGE 11:
<p>relieved to hear</p>
PAGE 12:
<p>it wasn’t a</p>
PAGE 13:
<p>date that I</p>
PAGE 14:
<p>was suddenly myself</p>
PAGE 15:
<p>and instead of</p>
PAGE 16:
<p>trying to figure</p>
PAGE 17:
<p>out what the</p>
PAGE 18:
<p>hell was happening,</p>
PAGE 19:
<p>But then things</p>
PAGE 20:
<p>changed. <strong>Big time</strong>.</p>
PAGE 21:
<p>I found myself</p>
PAGE 22:
<p>not in the</p>
PAGE 23:
<p>usual “peck and</p>
PAGE 24:
<p>run” mode.</p>
<p style='color:blue;'>And</p>
PAGE 25:
<p style='color:blue;'>she did</p>
Hopefully this makes a bit more sense.
Thank you for your continued help (and patience).