I have the following text that includes section and subsection numbers separated by a colon, followed by several words of text (I used lorem for this example):
1:1 Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed commodo, nisi non faucibus consequat, purus neque ultrices dolor, eget ornare ligula diam a velit. 1:2 Cras auctor massa eget diam ultrices rutrum. Sed convallis purus non nibh sollicitudin non pulvinar leo lacinia. Nunc luctus risus id elit accumsan quis auctor augue posuere. 1:3 Nullam urna lectus, molestie id vehicula sit amet, convallis luctus justo. Phasellus vestibulum mi non ligula pulvinar pulvinar. Nullam egestas imperdiet diam id adipiscing. 1:4 Etiam ornare tincidunt dictum. Maecenas aliquam venenatis massa, et viverra arcu tempus eu. Mauris nisl arcu, interdum vel aliquet in, malesuada eget turpis. Sed vitae sapien tortor, nec varius dolor. Integer egestas condimentum tincidunt. Mauris eget libero non ligula viverra dapibus. Mauris nunc nisi, facilisis nec volutpat at, cursus non quam. Donec ac erat quis enim vehicula auctor id quis nibh. Proin nisl lacus, viverra sed vehicula quis, convallis vel arcu. Nunc ut urna in orci pretium consequat semper in felis. 1:5 Phasellus elementum, velit eget facilisis ultricies, nibh magna lacinia mi, vitae pellentesque mauris lorem a enim. Nunc quis iaculis turpis. Ut egestas ante eu urna sagittis blandit. Aliquam venenatis diam sit amet purus egestas sollicitudin. 1:6 Phasellus nec nunc a leo commodo posuere. Nam sit amet dui a mauris tristique feugiat. Duis dui turpis, ultricies et venenatis at, imperdiet eu ipsum. Nullam et nunc massa. Aenean tellus quam, fringilla non imperdiet ac, pulvinar non augue.
I plan to separate the section numbers, subsection numbers (losing the colon) and word text into separate columns in a database table.
I know I can capture the section numbers like this:
[0-9]+:
and the subsection numbers like this:
:[0-9]+
What I'm needing help with is a regex that will capture all the words between each section/subsection number.
I tried this:
([a-zA-Z]+.)
but it only captured one word at a time, and I need to capture all the words between any two section numbers.
Can anyone help?
I am probably going to have to pass over my document three times with a separate regex each time and store the matches in an array.
Randy H. Johnson