Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

RegEx ReMarks

Some opinions, advice, and rants about regular expressions.

You want to use a regex to do WHAT?

This is in response to the comments and blog posts made on Jeffrey Schoolcraft's and Darren Neimke's blogs.

I have to disagree with the blanket statement that regular expressions should never, ever be used to try and match email, date, or HTML patterns.  I'm almost at a loss here to even get into it, because the first thing that comes to mind is to just say "Oh come on now, seriously!"  Granted the comment was left on April Fool's Day (although the server claims otherwise)... Anyway, in order to disprove the blanket statement, it is only necessary to come up with a single useful application for such expressions, preferably one for which a more code-intensive approach would not be well-suited.

How about data entry?  I've written a few call center applications.  It's also common these days to use web-based data entry applications on intranets because of the easy updates web-based applications allow for.  With any data entry application, speed of entry is important, but so is accuracy.  The best time to correct something is as soon as it's been entered (through input masking or perhaps OnBlur).  To accomplish this in a web application typically means JavaScript.  JavaScript supports regular expressions quite well, but isn't really well-suited to communicating with mail servers.

So, ideally, if you want the best user experience, you'll use a regular expression for the emails, dates, phone numbers, etc. in the application tied to javascript.  Then, when a record is submitted, you can perform better validation.  It might be sufficient just to check that the email was well-formed and the date properly formatted with the regex, while the application uses code to try to verify that the email is valid (or ideally, requires the user to confirm it by responding to an emailed inquiry).  The application would also be responsible for ensuring that dates, beyond being properly formatted, are valid for whatever purpose they're serving (date of birth probably should be in the past, for instance).

Thus, regular expressions can provide 'good enough' validation to allow for most typographic errors to be caught and enhance the user experience by providing immediate feedback in scenarios where code-intensive validation is impractical.  Better validation should then be performed elsewhere, usually in some middle tier business rule enforcing area.

This is the most obvious example, and one I've personally encountered many times as a consultant at different companies.  I'm sure there are other scenarios where the quickness and simplicity of a regex outweighs the value of the additional accuracy that a more code intensive approach might provide.  Again, it's a trade-off of what is 'good enough' and in many cases a regex will suffice even in situations where it doesn't catch every possible error.

Sponsor
Published Friday, April 01, 2005 5:51 PM by ssmith

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

ssmith said:

Hi Steve... absolutely spot-on. For example, it would be totally stupid to have a field named "Email Address" on your form and have to do a roundtrip just to tell the user that we couldn't find an e-mail address for "http://mysite.com". RegEx should definitely be used for 'Good enough' checks.
April 1, 2005 6:08 PM
 

ssmith said:

No, I'm dead serious. I don't care what date it is. {grin}

Look, I'll let you use a regex to valid an email address client side as long as it is NO LESS THAN the regex given in
<http://search.cpan.org/src/MAURICE/Email-Valid-0.15/Valid.pm>,
because *that* is what it takes to validate RFC822 addresses.

And I'll let you use a regex to scan through HTML as long as you understand ALL the nuances of HTML. You *are* aware of things like the attribute values can be quoted with either single or double quotes, correct? And that a greater-than can appear in an attribute value without being escaped? And that a comment needs an even number of hypens, in pairs? And... and... and...

And I'll let you use a regex to validate a date, as long as it's simply looking for mm-dd-yyyy, and not trying to figure out whether there are 28 or 29 days in this month. That's better done in other code. Not a regex.

The problem is that most people construct regexen based on very limited exposure to valid vs invalid examples. So they get these cases wrong. Very wrong.

Thus, my blanket prohibition, unless otherwise understood.
April 1, 2005 8:16 PM
 

ssmith said:

Randal, amen brother!

Steve, I'll concede their are times when regex can be used to do "macroscopic validation" or "good enough" validation as long as everyone that reads this, or tries to use a regex for these tasks understands that IT IS NOT A COMPLETE SOLUTION (for emphasis since html tags won't work).

Having the user type their email address twice is inconvenient but a simple equivalence check will spot typing mistakes. Having them validate the email address through some HttpModule attached to some URL passed to them through an email sent to the address they registered with will provide you the best validation possible.

Use calendars for dates, or try to parse them in the business layer.

Regardless, any validation you think you're doing at the UI level, certainly with javascript functions, should be duplicated or at least handled at the business layer because anyone can disable javascript.
April 3, 2005 12:15 PM
 

ssmith said:

let us see!
April 4, 2005 4:54 PM
 

ssmith said:

To further pile onto Brother Randal's and Brother Jeff's excellent points, one can always use XMLHTTPRequest within Javascript to ask a server if something is valid. If I'm not mistaken, it may be faster to do this and ask Email::Valid if an email is valid than to ask Javascript's regex engine (which may or may not be implemented correctly!) to do it.
April 4, 2005 4:59 PM
 

ssmith said:

I want use regex to parse and replace all tags in vBulletin Board editor: for example:
[B]WORDS[/B] --> <b>WORDS</b>
[IMG]http://www.doaminname.com/img.gif[/IMG] --> <img src="http://www.doaminname.com/img.gif" >
[FONT=Arial> BLA [/FONT] --> <font face='arial'>BLA</font>

Thanx
April 12, 2005 12:43 AM
 

ssmith said:

I want to use regex's to clean my kitchen.
April 13, 2005 2:06 PM
 

ssmith said:

I'd be interested to see stats on how often validating an email address client-side using regex actually does anything useful.

I'd expect very little: most input errors will be simple typos that won't be caught by regex validation.

As for date validation, that's better done by parsing the date and redisplaying in a canonical format.
April 13, 2005 3:10 PM
 

TrackBack said:

April 1, 2005 5:55 PM
 

TrackBack said:

April 6, 2005 9:28 AM
 

TrackBack said:

April 6, 2005 9:28 AM
 

AlanM said:

I'm in general agreement with Steve, although Jeff and Randal have a point that most email regular expressions are not good enough. However, even the Perl-based regex that Randal alluded to is not quite "all-encompassing" in its embrace of RFC 2822 (I don't track RFC 822 because it's obsolete). If you run the Perl-based regex one of the (admittedly) odd sample email addresses listed in A.5 of RFC 2822, it fails to recognize the valid address: John (my dear friend) I presume it doesn't recognize the valid comment in the parentheses. The point of RFC 2822 was NOT to define a way to create a "well-formed" email address (whatever that might mean). It was to give messages a format where, among other things, the email address, such as it is, can be parsed from the contents of the message. Naturally, it needs to be as forgiving as possible, even on internal networks. So, addresses like .@2 are quite valid RFC 2822 addresses, but they are hardl
May 3, 2005 8:23 PM
 

AlanM said:

hardly (oops, damnit!) I meant !@2 is a valid RFC 2822 address (whereas .@2 is not) But for a public or semi-public site on the net, your business rules will usually demand something a little more likely to be an "actual" address, RFC 2822 be damned. So there's a need for a better regex for email addresses, but not a need for "no regex at all." What would help out a lot would be a better definition than RFC 2822 for a "well-formed" email address.
May 3, 2005 8:29 PM
 

AlanM said:

Okay, I posted what I hope is a better mousetrap^H^H^H^H^H^H^H^H^H regular expression to SORT OF validate email addresses to regexlib. http://www.regexlib.com/REDetails.aspx?regexp_id=1074. This should be a litte better than "good enough". I hope Jeff and Randal don't give it an automatic "1". BTW - the Perl regular expression that validates email addresses is no good for validating user-input addresses. It's not even completely good for validating addresses in an SMTP host. It's just big and scary looking, and I guess that is supposed to mean that it is really good, right? (snickers).
May 5, 2005 12:59 AM
 

Dean said:

I'm having trouble getting a regex to return the tables involved in a SQL string where aliasing May have been used on the tables. Any ideas?

EG:
SELECT

'System Date' AS FLAG,

'' AS AGREEMENT_NUMBER,
0 AS TRANSACTION_DATE,
0 AS TRANSACTION_TIME,
'' AS TRANSACTION_CODE,
'' AS DESCRIPTION,
-- '' AS NARRATIVE,
-- '' AS USER,
-- '' AS WORKSTATION,

'' AS COMPANY_CODE,
'' AS COMPANY,
'' AS DEPARTMENT_CODE,
'' AS BRANCH_DESCRIPTION,
'' AS PRODUCT_DESCRIPTION,
'' AS SALES_PERSON,

'' AS PRODUCT_CODE,
0 AS SCHEDULE_START_DATE,

SUBSTR("PDTAARA"."PDTAVAL",17,8) AS PRIOR_SYSTEM_DATE

FROM PDTAARA

WHERE
PDTARNM = 'CHPDATDTA' AND PDTVLST = 201

UNION ALL

SELECT

'Dataset' AS FLAG,
SUBSTRING(B.ENTITY_KEY,1,10) AS AGREEMENT_NUMBER,
B.TRANSACTION_DATE AS TRANSACTION_DATE,
B.TRANSACTION_TIME AS TRANSACTION_TIME,
B.TRANSACTION_CODE AS TRANSACTION_CODE,
TRIM(C.TRANSACTION_DESCRIPTION) AS DESCRIPTION,
--TRIM(B.TRANSACTION_NARRATIVE) AS NARRATIVE,
--TRIM(B.USER) AS USER,
--TRIM(B.WORKSTATION_ID) AS WORKSTATION,

L.PGLTCOY AS COMPANY_CODE,
L.PRPCONM AS COMPANY,
L.LN0DPT AS DEPARTMENT_CODE,
L.PBCHDES AS BRANCH_DESCRIPTION,
L.PPRDDES AS PRODUCT_DESCRIPTION,
L.PUSEDES AS SALES_PERSON,

S.PANCDE1 AS PRODUCT_CODE,
S.SCHEDULE_START_DATE AS SCHEDULE_START_DATE,

'' AS PRIOR_SYSTEM_DATE

FROM PTXNHST B
INNER JOIN PTXNFIL C ON C.TRANSACTION_CODE = B.TRANSACTION_CODE
INNER JOIN LNRDUCRF L ON L.AAGRNUM = SUBSTRING(B.ENTITY_KEY,1,10)
INNER JOIN ASCHEDL S ON SUBSTRING(B.ENTITY_KEY,1,10)= S.AAGRNUM

WHERE
B.TRANSACTION_CODE IN ( 'A45' )
AND
B.TRANSACTION_DATE >= CASE ('{?Run Type}')
WHEN 'Prior System Date' THEN
(SELECT CAST(SUBSTRING("PDTAARA"."PDTAVAL",17,8) AS INT)
FROM
"PDTAARA" "PDTAARA"
WHERE
"PDTAARA"."PDTARNM" = 'CHPDATDTA' AND "PDTAARA"."PDTVLST" = 201)
ELSE ( 10000*YEAR({?From Date}) + 100*MONTH({?From Date}) + DAY({?From Date})) END
February 20, 2006 3:50 AM
 

Buying buspar online. said:

Buspar vs zanax. Buspar.

November 12, 2008 1:17 PM
 

Mahak said:

hi,

I have some HTML code and I want to extract some peice of code from the HTML content.

The HTML code I have:

###################################

###########    HTML Code   #############

###################################

"<p class="style3">

<script type="text/javascript"><!--

google_ad_client = "pub-5144722207444267";

/* 728x15, created 11/13/08 */

google_ad_slot = "4414587854";

google_ad_width = 728;

google_ad_height = 15;

//-->

</script>

<script type="text/javascript"

src="http://pagead2.googlesyndication.com/pagead/show_ads.js">

</script></p>

<center>

<div id="container">

<p class="style10"><strong>Mobile

Code :  <span id="Label2">9988 <br/><br/> State : Punjab. <br/><br/> Reference City : Amritsar/Jalandhar/Patyala. <br/><br/> Service Provider : Vodafone. [ GSM ]</span>

</strong></p> </div>

<p class="style11">&nbsp;&nbsp;<a href="BLOCKED SCRIPThistory.go(-1)" style="text-decoration: none"><font color="#000000">BACK</font></a></p>

<p class="style3">

<script type="text/javascript"><!--

google_ad_client = "pub-5144722207444267";

/* 728x90, created 11/13/08 */

google_ad_slot = "3275304904";

google_ad_width = 728;

google_ad_height = 90;

//-->

</script>

<script type="text/javascript"

src="http://pagead2.googlesyndication.com/pagead/show_ads.js">

</script>

</p>

</center>"

#####################################

###### And the code I want to extract from the above code is:  ###

#####################################

"Mobile Code :  <span id="Label2">9988 <br/><br/> State : Punjab. <br/><br/> Reference City : Amritsar/Jalandhar/Patyala. <br/><br/> Service Provider : Vodafone. [ GSM ]"

Please can some write a regular exprassion in PHP or in .NET.

Responce will be appriciated.

Thanks

Mahak

http://search-end.com

May 15, 2009 9:08 AM
 

Sample Email Addresses | Email Marketing said:

October 26, 2011 1:38 PM
 

Sample Email Addresses | Email Marketing said:

October 26, 2011 1:38 PM

Leave a Comment

(required) 
(optional)
(required) 
Enter the code you see below

Submit