Who should be using regular
expressions?
This has been on my mind for a while,
there are some people who shouldn't be using regular expressions. People
who don't know what regular expressions are for.[:^)] Now you can use
regexes in a variety of ways, and you can debate either side on
whether certain applications are good uses or bad uses, but that's
not what I'm talking about. I'm not even talking about people who
don't know how to write regexes, well or at all. I'm talking about
people trying to use regular expressions but have no idea what
regular expressions do. And I don't mean that you can't decipher a
regex pattern by looking at it. I mean you don't understand the
general concept that regular expressions match
patterns in data.
I've seen several post over on the
regexadvice.com forums in the past few months which go something like
“I have this problem someone told me I should use a regular
expression” but unless that same someone bother to explain what a
regular expressions does or why it is suited for your task, don't be
in such a hurry to plug in a regex.
I'm not saying you need to master
regex before using them just get your head around the basics
(matching). At the very least you should associate regex to wildcard
searches, just more powerful. If you are at least that far, proceed
with trying to implement your regex. But if you are thinking
something completely different slow down there. Unless you have a
basic understanding of what regular expressions do, even simple tasks
become way more difficult than they should be. Even if you get
someone to help you write a regex and still don't know what it's
doing you are likely to continue to incorrectly try to use them.
A common result from this lack
understanding I've been seeing is trying to perform a task solely via
writing a regex pattern, that regexes themselves have no capacity to
perform. Not that a regex couldn't be a part of the solution but in
the end it's not even what will do what the person wants done,
usually it a coding task, but the regex can at least do some of the
prep work of the task. Even if some implementations of the regex
engine allow code to be called, it is still the code doing the heavy
lifting. The regex is only finding the data. One could save
themselves hours of wasted time if the just understand the basics of
what they are getting themselves into.
Another thing I've seen, related to the
not having a basic understanding, is a task that is not only
perfectly suited for using a regex but a very common application of
a regex task and the person asking the question asking “Is this
some that a regex can do?” Of the issues I've mention this one is
the most and the least understandable. The most because regular
expressions are not simple to everyone, to some it comes easy to
others it never comes completely. And some of the documentation isn't
the greatest, though there is plenty of good documentation out there these days. So I can understand why some can't get their head
about exactly how to write a regex. But it the least understandable
when they are asking how to write a regex so common, and usually
simple, that dozens of tutorials and/or articles on regexes use the
very regex they are asking for as an example.
I think the main problems of people
who fall into this category is
Lack of research. There are more
than enough tutorials out on the web and more than a few books that
have simple samples to give you an idea of what a regex does. I find
it very hard to believe when someone says they couldn't find a regex
for basic (US) zip codes. Did you even look? Or are you only using a
regex for this task because someone told you that you should and ran out to find someone to write it for you? or are you just cheating on your homework?
Misunderstanding what regular
expressions understand. There seems to be more than a few people
who think a regex understands the context of the data it matches.
That it not only know how to match the data but it know what it
means, either in context to their application or to the world at
large. Those are the people that think a regex for matching zip codes
know what zip codes are used for and where they would be used.
Sorry but that's not the case. Regexes understand nothing of what
your data means to you. As far as the regex engine is concerned
it's just a string of characters. It's up to the regex author know
the context of the data they want to match and to shape the regex
accordingly to return only the relevant data. This may make it seem
like the regex understands the data it's matching but that not what
happening. What happen is the person who wrote the regex understood
the data and the problem set so well they were able to construct a
regex the only match the relevant values.
That regex is a full blown
programming language. It's not. I've seen questions about wanting
a regex to compare numbers, tell time or do some other function
completely outside their realm but something most programming
languages have a feature to deal with or let you write code that
can. I've never seen any regex documentation promoting such
features so I can't imagine why someone thinks a regex can perform
these task. Other than my previous point where they saw the results
of a well written regex and speculated on what and how much work the
regex did. Like I mentioned some implementation allow you to perform
function calls but that is more an add-on of the programming
environment you are using than a generic regex feature. Realize
that regular expressions are one of many features of your
programming language, not the other way around.
If you've read this far and you didn't
know what regular expressions did or didn't do before hopefully by
now you have some idea. And if you already knew what they did just
stop to consider the next time you advise someone to use a regex, you
make sure you get across the high level point that you are “matching
something (a character pattern) in a string” before you get into
the more complex aspects of what a regex can do.