JavaScript is fast becoming the bane of my existence, at least as far as regexes go.
I recently blogged about how JavaScript behaves differently on the same valid pattern than .Net. I found another case of this. I’ve also blogged about testing with the same regex engine you plan to use your regex with. A lesson I’ve yet to learn myself. I recently wrote several versions of my date regexes that I deemed JavaScript safe. One small problem. They didn’t work in JavaScript. At least not 100%. To be fair to myself even if I had tested them with a JavaScript engine I still might not have found the main problem since it was neither a syntax or logic error. Actually one did have a (JavaScript) syntax error but that wasn’t the real problem.
The approach I used in writing these regexes was using the .Net version as the base but only using syntax I knew JavaScript supported, which basically meant remove the named groups and re-working the lookarounds that JavaScript didn’t support, the lookbehinds and conditional. I eventually came up with expressions which followed the same logic but with JavaScript valid syntax (except for one where I accidentally left a named group in). My “mistake” is that I used the Regulator, which I used to write it, to due the bulk of my testing. Now the Regulator didn’t do anything wrong. The test passed with flying colors since I sure the Regulator is using the .Net engine for matching. I did a few test with JavaScript and they too passed. I didn’t test too much with JavaScript because I figured if the syntax was valid, the logic would work the same for both JavaScript and .Net. Silly me.
The problem was the months with 30 days April, June, September and November (4,6,9 and 11) failed every check. I couldn’t figure out way because the worked like a charm with the regulator and nothing in the regex was .Net specific.
I’d changed the expression on the regexlib but I still was not 100% sure of what the “bug” was. There seems to be a problem with the negative lookahead . At first I thought it didn’t allow special characters inside of it because none were acting like I thought they would or like they did in .Net. I starting to think it was a short-circuit issue where the JS engine is negating each character before evaluation it and stopping on the first non-match. Where I think the.Net is evaluating the entire group then negating the result.
Well a little further testing shows that this problem seems to be Mozilla based. A test string that failed in Mozilla and Firefox passed in Opera and IE. Looks as if the character classes are short-circuiting meaning if you have a negative lookahead that uses a character class say ^\d(?!\.\d5) which should match a digit that is not follow by a decimal then a digit then a five. 1.25 shouldn’t match but 3.14 should.
The Mozilla bug would cause 3.14 to fail because it would negate the \d making it not a digit (\D) and upon the ‘1’ not matching that criteria would end the check without looking at the rest of the lookahead to see if the next characters is or isn’t a 5. I’ve reported this bug to Mozilla , Bugzilla bug 254296,so it’s in their hands now. and it has been fixed.
Just when I thought it was safe to go back in the water another bug came up and bit me. But this time it was Internet Explorer (IE). After test modified version of the regex to get around the Mozilla bug I found that IE wouldn’t match the leap year check. That portion of the expression used a lookahead. Mind you this same regex would work fine again the same test value when using .Net or even with Mozilla using JavaScript. But not using JavaScript, well Jscript, with IE. This bug was even stranger than the first. I’m still not exactly sure the problem is but I tracked it down to the point of an alteration choice inside the look ahead. It’s the point where I was checking for the days in February that aren’t the 29th. It not the only alteration in the expression just the last. If I remove this portion the leap year check worked fine at the expense of all the other days in February. Add it back or even add a modified version in its place. The whole leap year validation would fail, only for IE using Jscript. It tried for awhile to “fix” this problem but there was really nothing to fix. Nothing was wrong with regex. It worked 4 out of 5 cases using exactly the same test value. I was trying my best not to rewrite the whole thing and beside some older versions of my date regexes the were JS compatible didn’t suffer from this strangeness. The logic in the construction between the old and new was pretty different so a rewrite might have to be total.
I finally got a workaround that both browsers like with JavaScript and Jscript by adding another alteration option outside of the lookahead for the February check. But I shouldn’t have had too.
Now while both problems had something to do with lookaheads you should still test with multiple browser if you are going to do any client-side validation with regexes to see if any other bug are scampering around.
For the moment I'm going to leave this yyyy/mm/dd regex http://www.regexlib.com/RETester.aspx?regexp_id=763 as it is, so you can see the IE bug yourself if you'd like. Use the javascript option and test “2004/2/29” ,or any of valid Feb 29 date, in IE, Mozilla, Opera and another other browser at your disposal.