Try:

^((?!(A)|(B)).)*(((A)\s+(B))|((B)\s+(A)))((?!(C)|(D)).)+(((C)\s+(D))|((D)\s+(C)))\s*$

I've tried this against the following test cases:

A B C D

B A C D

A B D C

B A D C

C D A B

D C A B

C D B A

D C B A

A C B D

A C C D

D A C B

and it matches only the first 4 cases.

Lets start by treating and A's etc just as individual characters. Also lets consider we are looking for "A B" or "B A" anywhere and with whitespace the only characters allowed in between/

The first part is is skip over any character that is neither A nor B. This is simply '[^AB]*'.

At this point we have 3 situations:

1) we are at the end of the line (i.e there are no A or B characters) - this will be handled as an error by the fact that neither of the other 2 options will be matched

2) the next character is "A" and therefore we must have only whitespace to the next character which must be "B"

3) the next character is "B" and therefore we must have only whitespace to the next character which must be "A"

We can handle this with an alternation such as

A\s+B|B\s+A

(remember that alternation has a very low precedence so that the entire pattern on either site will be considered as a whole)

We can put this together and get:

[^AB]*(A\s+B|B\s+A)

At this point, let us remind ourselves that A and B are complete words and not just letters. This means that the '[^AB]' part isn't going to work, but we can create an equivalent that WILL work with words:

((?!A|B).)*

(There is an alternative that is often used in this situation which is '.*?(A|B)' but I don;t want to use this here because that will actually match and therefore set the text pointer to AFTER the A or B where as we want to check that character later on).

Thus we have

((?!A|B).)*(A\s+B|B\s+A)

In this form, we really don't NEED to put parentheses around the A's and B's but if wer did then it would look like:

((?!(A)|(B)).)*((A)\s+(B)|(B)\s+(A))

This is beginning to look very like the first part of the pattern we had at the start. All we need to do is to create a similar pattern for the C and D values:

((?!(C)|(D)).)*((C)\s+(D)|(D)\s+(C))

using the same logic.

If I "extrapolate": from your example abut the reagents, and assume that you are NOT talking about an equilibrium relation where the left and right sides can be exchanged as whole entities (i.e. "A B C D" is value but "C D A B" is not - I'll get to that later if necessary) and assumign that a single answer occurs on a single line then we can have:

^ - start at the beginning of the line

((?!(A)|(B)).)*((A)\s+(B)|(B)\s+(A)) - require a match of A and B in either order

((?!(C)|(D)).)*((C)\s+(D)|(D)\s+(C)) - require a match of C and D in either order

\s*$ - allow for trailing whitespace and then the end of the line

which, when all on one line, is what we started with.

Now, if you want to handle the "equilibrium" reaction case, we need to turn the middle parts into lookaheads and the last part into a match of everything (as you haev in your original pattern). Therefore we add

(?= ........ )

around the 2 middle parts and use '.*$' at the end and get:

^(?=((?!(A)|(B)).)*(((A)\s*(B))|((B)\s*(A))))(?=((?!(C)|(D)).)*(((C)\s*(D))|((D)\s*(C)))).*$

This matches the first 8 of the test cases above.

By the way, in all of these, I've used the "Ignore Case" and "Ignore Whitespace" options in my testing which has help me to create the patterns as separate lines, and also the "multiline" option which lets me create the multiple line test case.

If I've made any incorrect assumptions about anything in deriving these patterns, then I hope you can see where to include the corrections, or please let me know and I'll see how we can incorporate them.

I hope this all makes sense.

Susan