About a year after I posted my original date regex (http://www.regexlib.com/REDetails.aspx?regexp_id=113) I saw several other date regexs that had be posted. Many used my pattern as a base. The new expressions were different though the changes seemed mainly cosmetic. Things like allowing only one type of separator make the year always 4 digits. Of course my original hope was that my regex would be a base that was very flexible but could be modified to suit a specific need. However seeing several new variants being posted I didn’t see that they really offered anything new. Other than not using backreferences most didn’t look much different. The biggest change was making the regex match dd/mm/yyyy instead of mm/dd/yyyy which mainly involve switching the order of the check. I had been asked for a dd/mm/yyyy version myself and was about to post it after the 3rd request but someone had beaten me to it. (http://www.regexlib.com/REDetails.aspx?regexp_id=151) But this was the most drastic change. The ones that worked didn’t work any better than any other and were usually written in almost the exact same fashion.. Some used character classes others spelled out the equivalent value (\d instead of [0-9]) but they all approached the problem pretty much the same.
It had been over a year since my original post. I had learned a lot more about regexs since then so I decide to see if I could improve on my creation. I was determined to take a different approach on the whole expression. Something more that only accepting 4 digit years, which I also did, but let’s face facts. A date is a date. There are 12 months and up to 31 days in a month. The leap year check was pretty solid. I didn’t see any way to improve or trim it (at the time) but months and days were another story. I realized a new approach that would work just as well but make the expression smaller.
The standard approach is
1) check for the months with 31 days then see if the day is between 1 and 31
2) if check 1 fails check for the months with 30 days then see if the day is between 1 and 30
3) if the month is February see if the day is between 1-29 and if the day is the 29th see if the year is a leap year
With the exception of the leap year check in step 3 the regexs that predate my leap year check used this approach or a variation.
Now I don’t know that at the time I was thinking of this “are you really thinking about the question” question at the time or not.
Q: How many months have 28 days?
A: All of them (you didn’t say one did you?).
I did realize that with the standard approach days 1-28 were involved in multiple checks.
So really you only need to check the month against days 29-31
Which is what I did in my original regex.
1) Check month with 31 days to see if the day is the 31st
2) Check months with 30 or 31 days and see if the day was the 29th, 30th
3) If month and day Feb 29 do leap year check
4) Check all months to see if day is 1-28
I saw that although I was only checking days 1-28 once I was checking some months 2 or 3 times.
Which gets us back to the question. “How many months have 28 days?” So if the day is between 1 and 28 do you really care which month it is?
This lead me to change my approach to the months. The months with 31 days are involved in three the checks, so if the month portion of your input is one of those values you will get a hit in up to 3 places. So why does it need to be in 3 different places. Like I said before the are only 12 month none which have more than 31 days. There are only 5 months without a day 31 and only one without day 30. I realized if while checking the month range I use a negative look-ahead for those 5 months to check days they couldn’t have. When I began checking the days I only have make sure it was in the range 1 to 31. By doing this I don’t have to check the same days or month in multiple places. And by also checking the validity of leap years with a look-ahead you only have to insure your input ends in your range of years.
The month check becomes
(0?[13578])|1[02]|(?[469|11)(?!.31)|0?2(?(.29)(?=.29.((1[6-9])|[2-9]\d)(0[48]|[2468][048]|[13579][26]|16|[2468][048]|[3579][26])00))|(?!.3[01])))
0?[13578])|1[02] are the months with 31 days
(?[469|11)(?!.31) are the months with 30 day
now February requires some special treatment because to determine if the 29th is a valid day you also need to know the year
so first you check that is the month is February AND the day is the 29
0?2(?(.29)Leap year | Non Leap year)
the (?(expression)yes|no) construct say if there is a match for the (lookahead) expression evaluate the yes part, otherwise do the no part.
In our case if the day is 29 see if the year is a leap year
(?=.29.((1[6-9])|[2-9]\d)(0[48]|[2468][048]|[13579][26]|16|[2468][048]|[3579][26])00))
remembering that the lookahead doesn’t consume any character you have to have the 29 and date separators in the leap year check (.29.)
Now if the day wasn’t the 29th just make sure it wasn’t the 30th or 31st
(?!.3[01])
Now all that is left is to see if your days are in the range 1-31 and year are in the range of 1600-9999. Days in the range but invalid for certain months will fail before getting this far. Values for day that pass the month check (ex. 32) but are not in the range fail here.
You can see examples of expressions using this logic here
: mm/dd/yyyy format http://www.regexlib.com/REDetails.aspx?regexp_id=504
: dd/mm/yyyy format http://www.regexlib.com/REDetails.aspx?regexp_id=505
both of these expressions also validate times