Firstly, I need to point out that, while you use different words such as "apostrophe" and "single quote" to differentiate the *usage* of a character, such distinctions are meaningless to a regex. A regex will work at the level of individual characters and so, if an "apostrophe" and "single quote" are represented by the same character (i.e. having the same character coding) then the regex will see them as identical.
Now, let me show you why regex patterns get very messy very quickly.
Lets start with wanting to locate all spaces that are NOT between balanced single quotes:
\s+(?=[^']*('[^']*'[^']*)*$)
What this does is to locate a space and then use a lookahead to make sure that, if there are any single quotes following the space to the end of the line, that there is a balancing "end quote" for each one. (Note the assumption here that you are using the 'multiline' option and that there are no line breaks between the quotes).
Now, let's bring in your distinction between an apostrophe (with an alphanumeric on both sides) and a single quote (which does NOT have an alphanumeric on BOTH sides). Using this,al apostrophe can be specified with the pattern
(?<=\w)'((?=\w)
The problem is that we want the opposite of this. Applying normal boolean logic we can get
((?<!\w)'|'(?!\w))
which will match a ' character that does not have an alphabetic on either site. (The outer set of parentheses are included here to limit the effect of the alternation - we will be inserting this sub-pattern into the first one above in several places so we need to contain the low precedence of the '|').
We can use this to match a single quote but not an "apostrophe". However, we have used the ' character within a character class in the first pattern and we cannot substitute this new pattern for the ' in one of those. Therefore we need to realise that the character set definition matches all characters except the ', and so we can use an alternate form of '[^']' which is:
((?!').)*
assuming that the 'singleline' option is set so that we can get exactly the same matches as the character set original.
Now we can substitute '((?<!\w)'|'(?!\w))' for ' and '((?!').)*' for '[^']*' in the original pattern and we get:
\s+(?=((?!(?<!\w)'|'(?!\w)).)*(((?<!\w)'|'(?!\w))((?!(?<!\w)'|'(?!\w)).)*((?<!\w)'|'(?!\w))((?!(?<!\w)'|'(?!\w)).)*)*$)
We can now use this in a regex "Split" operation as it will find any (sequence of one or more) white space characters that are followed by an even number of "single quotes" (including none) bearing in mind the distinction between a "single quote" and an"apostrophe".
It is also possible to have this handle double-quotes as well as single quotes in 2 ways: the first is to substitute '['"]' where ever a ' character appears in the pattern but this also means that double and single quotes can be used interchangeable and, depending on your text, may lead to strange results, and
Joe"s
would be treated as an apostrophe. The second is to go back to the original pattern and make it:
\s+(?=[^'"]*((['"])((?!\2).)*\2[^'"]*)*$)
which will handle balanced single or double quotes correctly (and, incidentally does NOT have a problem with "eat at Joe's" !) and then go through the process I've outlined above to apply the different meaning to single quotes and apostrophes - I'll leave this as an exercise for the reader!
I'll also let you decide if this answers your question "Would it help if..... "
Susan