When I first begin working with regex one of the very first that I wrote dealt with dates. (http://www.regexlib.com/REDetails.aspx?regexp_id=113) The unique thing about it when I posted it was that it correctly handled dates in leap years. Since I posted it I’ve had many people use or copy and modify it, but I don’t know how many understand how or why it works.

I had found the regexplib (www.regexlib.com) and after seeing all the samples, wanted to see if I could write an expression. Now there were already a few other date regex there. Most simply matched the xx/xx/xxxx pattern we are used to seeing dates in.

\d{1,2}/\d{1,2}/\d{2,4}

A few went so far as to limit the value of the months to 1-12 and the days to 1-31. One or two went even further to limit which months could have 31 day and which could have 30. For mm/dd/yyyy the approach to this is very straight-forward.

Check to see if the month is one that can have 31 days

0?[13578]|1[02]

then see if the day is 31

(0?[13578]|1[02])/31

if that failed do a similar check for months with 30 days

(0?[1,3-9]|1[012])/30

otherwise, now here’s where the trouble starts, check on the days less than thirty for all months. All of the regexs I had seen balked on February. Some put it in with the check for 30 day months. Others put it in the less than 30 day check allowing it to always have 29 days.

(0?[1-9])|1[012])/([12]\d|(0?[1-9]))

The three checks are or’ed together then finally the year is checked.

I was still pretty new to regex, and I didn’t see that there was anyway to do division with them so checking for leap year seemed impossible. I remembered learning as a child the leap year wasn’t simple every 4 year as a lot of people think and though I didn’t always remember the second part of the rule I was aware of it. So I really wasn’t planning to do that check in my regex but then I gave it a little thought and realized I could check leap years. First the leap year rule is this:

According to the Gregorian calendar, which is the civil calendar in use today, years evenly divisible by 4 are leap years, with the exception of centurial years that are not evenly divisible by 400. Therefore, the years 1700, 1800, 1900 and 2100 are not leap years, but 1600, 2000, and 2400 are leap years.

(source: http://aa.usno.navy.mil/faq/docs/leap_years.html)

Multiples 5 always end with one of the same 2 digits (0 or 5), so if you were checking for multiple of five you only need to see if the last digit is 0 or 5. You can’t just check the last digit with multiples of 4. Multiples of 4 always end with an even digit but ending with an even digit doesn’t make it a multiple of 4.

But after running thru the times table in my head I realized the last digit of a multiples of 4 does form a repeating pattern occurring in units of 20. I then realized that the digit in the tens column was always even for years ending in 0, 4 and 8 and odd in years ending in 2 and 6

So we get

[02468][048] | [13579][26]

So our leap regex begins as

(1[6-9]|[2-9]\d)([02468][048] | [13579][26])

where the (1[6-9]|[2-9]\d) is the century and ([02468][048] | [13579][26]) is the divisible by 4 leap year check

Now only leaves the centurial years evenly divisible by 400, but you stop to think about it is the same pattern just in the hundred’s and thousand’s column when the ten’s and one’s column are both 0. So first you take out the 00 match from the above expression because you don’t always want to match it, and change it to

0[48] | [2468][048] | [13579][26]

Then find the leap centuries

([2468][048] | [13579][26])00

and since the rule didn’t come into effect until1582 I limited the range of dates to 1600 onward, the match on 1200 should be removed which is easy done by adding a check for 16 and removing the check for 1 in the thousand column. Leaving us with

(16 | [2468][048] | [3579][26])00

the full expression is

((1[6-9]|[2-9]\d)([02468][48] | [2468][048] | [13579][26] ))|( (16 | [2468][048] | [3579][26])00)

In my original date regex I also allowed 2 digit year which is done simply by making the centuries optional (1[6-9]|[2-9]\d)? in the expression. You’ll note that this still won’t validate 00. When I posted the regex I got several comments that this was an error or a bug. It wasn’t I intentionally did not try to validate year 00. 00 is ambiguous to which century and since you need a century to fully determine leap years at some point this regex is going to be wrong when dealing with a two digit 00 year. Even though the most recent turn of the century, year 2000 was a leap year, the intention of this regex is that it could be used for centuries to come. So the choice was to make it as accurate as possible by making it wrong once every 400 years instead of wrong 3 times in that same range.

The final expression is

^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[1,3-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$

In the final expression I allow the user to pick one of three commonly used field separators (/), (.) or (-) but which every one you choice you had to chose for both the month-day and day-year separators. The non capture groups (?:) were added latter when I found out later some regex engines allow only 9 backreferences to be referenced and the original was using backreference in the 20’s, which I miscounted several times.

Also note the final expression has boundary markers (beginning and end of string). They are very important because without them an invalid date will match if they contain a valid value within them.

Ex. 13/4/2004 would match because 3/4/2004 is a match

2/29/2011 would match because 2/29/20 matched by the two year check

## Comments