First off let me say I'm a bit over my
head here. Not regex part but host the language of the regex engine.
Many moons ago I posted a blog article
stating why you could not write a regex that validated an e-mail
address 100%. Well this is still true, however in that
posted I also stated that the pattern was so massive that it wasn't
worth using. This is also still true however I was made aware of a
flavor-specific syntax that reduces the regex from massive to very
large.
This regex is for the PCRE engine.
http://www.myregextester.com/?r=337
Though from what I've read this will work for PHP too. Now I don't know Perl or PHP or what
minimum version of PCRE supports this syntax. That being the case I
also don't how well it performs. I wrote the original version using
the .Net syntax and not only was the regexPublish massive, which is one reason I never posted it but the
performance was terrible. Given that most people want to use this
type regex to validate a data entry field, the pattern was overkill.
In fact I recommend that you don't use this, except to learn from.
The PCRE version may perform better but I don't have the means or
time to test, so use at your own risk. For simple field validation
even this is still overkill. For a large text file performance
may suffer horribly. Most likely you aren't going to want to use
this pattern as it is too large for simple test and performs poorly
for large test.
When I see people asking for Email
regex, I point out that perfect validation is not possible. And when
I see so-call email validating regex that are only about 50
characters long, it makes me chuckle. This pattern is probably to
most compact version of a RFC 2822 address regex you'll find and it
is still huge. Ports to other regex engines not supporting the
recursive syntax will easily be 4x as large as my .Net version was.
The above pattern does the RFC Spec up to
the address-spec, which pretty much what people are thinking about
when they are saying Email address.
It not to hard to take to it up a few
more level in the spec using this syntax
RFC 2822 mailbox :
http://www.myregextester.com/?r=338
but like I said it likely won't perform
well enough to be useful. The two patterns I've linked to I've
wrapped in anchors so they are just matching against the whole
string. Searching for a string within a larger body, without anchors will probably
degrade performance very fast. But if any of you PHP or Perl gurus want to stress test this beast, have fun. Maybe it's not as bad as I think it may be.
Save and Continue Writing