Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Michael Ash's Regex Blog

Regex Musings

Are you ready for regex?

Who should be using regular expressions?

This has been on my mind for a while, there are some people who shouldn't be using regular expressions. People who don't know what regular expressions are for.[:^)] Now you can use regexes in a variety of ways, and you can debate either side on whether certain applications are good uses or bad uses, but that's not what I'm talking about. I'm not even talking about people who don't know how to write regexes, well or at all. I'm talking about people trying to use regular expressions but have no idea what regular expressions do. And I don't mean that you can't decipher a regex pattern by looking at it. I mean you don't understand the general concept that regular expressions match patterns in data.

I've seen several post over on the regexadvice.com forums in the past few months which go something like “I have this problem someone told me I should use a regular expression” but unless that same someone bother to explain what a regular expressions does or why it is suited for your task, don't be in such a hurry to plug in a regex.

I'm not saying you need to master regex before using them just get your head around the basics (matching). At the very least you should associate regex to wildcard searches, just more powerful. If you are at least that far, proceed with trying to implement your regex. But if you are thinking something completely different slow down there. Unless you have a basic understanding of what regular expressions do, even simple tasks become way more difficult than they should be. Even if you get someone to help you write a regex and still don't know what it's doing you are likely to continue to incorrectly try to use them.


A common result from this lack understanding I've been seeing is trying to perform a task solely via writing a regex pattern, that regexes themselves have no capacity to perform. Not that a regex couldn't be a part of the solution but in the end it's not even what will do what the person wants done, usually it a coding task, but the regex can at least do some of the prep work of the task. Even if some implementations of the regex engine allow code to be called, it is still the code doing the heavy lifting. The regex is only finding the data. One could save themselves hours of wasted time if the just understand the basics of what they are getting themselves into.


Another thing I've seen, related to the not having a basic understanding, is a task that is not only perfectly suited for using a regex but a very common application of a regex task and the person asking the question asking “Is this some that a regex can do?” Of the issues I've mention this one is the most and the least understandable. The most because regular expressions are not simple to everyone, to some it comes easy to others it never comes completely. And some of the documentation isn't the greatest, though there is plenty of good documentation out there these days. So I can understand why some can't get their head about exactly how to write a regex. But it the least understandable when they are asking how to write a regex so common, and usually simple, that dozens of tutorials and/or articles on regexes use the very regex they are asking for as an example.


I think the main problems of people who fall into this category is


  1. Lack of research. There are more than enough tutorials out on the web and more than a few books that have simple samples to give you an idea of what a regex does. I find it very hard to believe when someone says they couldn't find a regex for basic (US) zip codes. Did you even look? Or are you only using a regex for this task because someone told you that you should and ran out to find someone to write it for you? or are you just cheating on your homework?

  2. Misunderstanding what regular expressions understand. There seems to be more than a few people who think a regex understands the context of the data it matches. That it not only know how to match the data but it know what it means, either in context to their application or to the world at large. Those are the people that think a regex for matching zip codes know what zip codes are used for and where they would be used. Sorry but that's not the case. Regexes understand nothing of what your data means to you. As far as the regex engine is concerned it's just a string of characters. It's up to the regex author know the context of the data they want to match and to shape the regex accordingly to return only the relevant data. This may make it seem like the regex understands the data it's matching but that not what happening. What happen is the person who wrote the regex understood the data and the problem set so well they were able to construct a regex the only match the relevant values.

  3. That regex is a full blown programming language. It's not. I've seen questions about wanting a regex to compare numbers, tell time or do some other function completely outside their realm but something most programming languages have a feature to deal with or let you write code that can. I've never seen any regex documentation promoting such features so I can't imagine why someone thinks a regex can perform these task. Other than my previous point where they saw the results of a well written regex and speculated on what and how much work the regex did. Like I mentioned some implementation allow you to perform function calls but that is more an add-on of the programming environment you are using than a generic regex feature. Realize that regular expressions are one of many features of your programming language, not the other way around.


If you've read this far and you didn't know what regular expressions did or didn't do before hopefully by now you have some idea. And if you already knew what they did just stop to consider the next time you advise someone to use a regex, you make sure you get across the high level point that you are “matching something (a character pattern) in a string” before you get into the more complex aspects of what a regex can do.

Published Friday, June 01, 2007 12:02 PM by mash

Comments

 

Stevezilla00 said:

Good post, Michael. Unfortunately, I think the regexadvice forums almost encourage people to not bother doing any research, given the quick turnaround most people experience here when asking others to write regexes for them.
June 1, 2007 6:32 PM
 

mash said:

You are right Steve. I'm sure a lot of people come here looking for results and aren't at all the concerned with how they are achieved.  Most like the same person that told them to use a regex also told them to go the RegexAdvice forum. I think the lack of feedback once a solution is given reflects that. Also with questions being answered there is more often a tendency to give out just answers than advice.  I don't think it would be a bad thing for someone responding with a regex solution to throw in their two cents about why a regex may not be the best solution to a given problem, even though it can be done via regex.

June 4, 2007 11:11 AM
 

Brendan said:

Very true. Most of the people in the forums do just get an answer and leave. I usually try to put a small explanation with my replies. This sort of problem is found in more than just regular expressions, though regular expressions are a very good example of this problem. There are many programmers out there who should probably not be writing anything more complicated than a "Hello world" program and through many forums manage to stumble there way into an almost working application. I'd say that when people come along saying they've heard they should use a regex you should make sure you give some explanation with the answer. Maybe it will get there attention enough that they will do some research.

June 4, 2007 3:32 PM
Anonymous comments are disabled