There are several things going on here and it might be easiest to separate them out.
For a start, I suspect the matching process is actually returning an array of captures (which is typically what you want, especially given the pattern you are using specifies multiple capture groups). The normal matching process returns the complete matching string as capture group 0 (or element 0 in your array) - hence the "as the custodian of the". For each set of parentheses in the pattern, the regex engine will create a capture group and number then starting with 1 and increasing by one each time. Therefore, you have defined capture group #1 as "AS", #2 as "THE", #3 as "FOR", #4 as "OF" and #5 as "THE".
In your test string, there is no "for" character sequence and that is why you have a null string being returned for capture group #3 in the output string.
Therefore, if all you want is the complete matched string, then just use the first ( 'zeroth') element of the returned array.
If you look carefully at the string returned in the first element of the array, you will see that it is actually followed by a trailing space. This is because you have a '\W*' right at the end of the pattern. (If you look carefully, there is also a space at the beginning of the actual string - you may not have seen it and may not have copied it in your posting - it is exactly the same thing caused by the '\W*' at the start of your pattern). The actual definition of the '*' quantifier is to "match zero or more occurrences of the preceding item, matching as many times as possible". The '\W' is the preceding item and will match non-alphanumeric characters such as the space character. This works quite well to account for the spaces between the optional words but will also capture any non-alphanumeric characters (such as line feed, punctuation etc) before and after the key words.
In this case it might be better to use the zero-width assertion '\b' at the beginning and end of the pattern. This will make sure that "AS" matches just the word "as" and not the last part of (for example) "has", but will not include any spurious characters at the start or end.
Once advantage of keeping the '\W*' to match between the optional words is that this will match the line-feed, carriage return and other characters that are used to split the required phrase over multiple lines.
If you are wanting to match an optional work, then you may be better off using the '?' quantifier rather than '*'. For example '\W*(AS)*\W*' will match "asasasasas" but '\W*(AS)?\W*' will not match that string as it requires just a single instance of "as".
So far I would recommend that you use the pattern:
Now if we consider the the (rather unlikely) test string that you have provided of "JOHN DOE AS THE CUSTODIAN FOR ..." (you probably would never have the ellipses but it is useful for illustrating a point) you get a match of
AS THE CUSTODIAN FOR ...
In this case you will see that the '\W*' operator after the '(FOR)?' is picking up not only the whitespace but also ellipses (and any other non-alphanumeric character as we talked about before). We have got rid of the trailing characters if the last matched word is "THE" but not otherwise.
If you can do without having the optional words in a specific order, then you could use a pattern such as:
This will effectively apply the '\b' before whatever optional word comes at the start and after whatever optional words come at the end. It also checks only for whitespace characters between words. It does have the disadvantage that it will match something stupid such as:
the as as the custodian of for the of the for