sindizzy:this does not work in VB.NET #D\d{3}(.+?REMARKS[^\r\n]*)#s. I dont think the # is supported.
sindizzy
This in one of the reasons why I tend to post my examples as the raw pattern or replacement text without quotes etc.. It is also one of the reasons why the posting guidelines ask about the regex and programming language that you are using (and I know the OP stated the use of VB.NET).
Finally it is why the posters to this forum CANNOT simply take what is suggested WITHOUT THINKING.
The suggested pattern above clearly has the rider that it was PHP code - therefore the '#'s are being used as pattern delimiters. If you didn't understand the pattern that was being suggested, then you should have asked for an explanation or at least indicated where you were getting lost.
You are right - the '#' is NOT part of the VB.NET syntax but if you really do understand VB.NET and have looked at the Microsoft documentation for the functions that you are using, you would have realised that the functions are expecting a string and VB.NET does not have a string form with leading and trailing '#'s. Therfore you will need to do some work for yourself and convert this into an appropriate form that is acceptable to the VB.NET functions.
The regexp I have currently D\d{3}[A-Z]{0,1}(.|\s)*?REMARKS(.|\s)*?(?=\r) seems to work but is relatively slow and sometimes crashes in my project. I test in Expresso first and it runs but in my project it takes a long time and sometimes will make the app crash.
I must admit that I have not investigated this too much, but I suspect the problem is in the '(.|\s)*?(?=\r)' part. What this will do is to match any character (by the way using either the '.' with the 'singleline' option or the [\S\s] character set will do the same thing as '(.|\s)') and then look forward for a carriage-return character. If the 'REMARKS' keyword is at the end of a line then it will be immediately followed by a line terminator (and as this is a Windows platform it may well be a '\r\n' pair unless this has been converted into '\n' by something else). I'm guessing that the regex engine will try to match the '(.|]s)' subpattern which will succeed on the '\r' character. Because you have made this a lazy quantifier it will then check for the lookahead which will fail. It will then go back to the '(.|\s)' part and try again, matching the '\n'. It will keep doing this until it finally does match a character that is immediately followed by a '\r'. However, in doing so it will have left behind many saved states so that it could backtrack - this may cause your program to crash when it runs out of stack space.
On the other hand, if it never finds the '\r', then the regex engine will start to backtrack, one character at a time, until it has removed ALL of the matches it has made - only then will it realise that a match count of 0 is what is needed in this case and so will declare the final 'success'. However, this will take a long time.
Finally, the catastrophic part may well be the interplay between the 2 instances of this subpattern on either side of the 'REMARKS' keyword. This can lead to an exponential explosion of combinations that the regex engine must check.
I would suggest something like (untested):
((?!REMARKS)[\S\s])*REMARKS[^\r]*
as an alternative - this should never need to backtrack as each step forward is deterministic.
I bet that when you are using Expresso you are using a small test text but in your program you have a much longer text to scan. You need to look at the timing information that Expresso is providing you and also provide positive, negative and borderline test cases.
Susan