Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

qr{ ^ \QIain Truskett's Regex Blog\E $ }x

Regularly expressing

Making regex dynamically

Many people use their regex just as static input. You write the regex into your code exactly as you intend to use it. Sometimes, it's useful to make your own on the fly.

Some (most) languages just have regex as strings. Some let you compile them into objects to save on having to recompile them for repeated matching. For example:

my $date_RE = qr/$days_RE, \s+ $months_RE \s+ \d+(?:th|st|nd|rd)/x;

This neatly makes a regex that can be passed around like any other variable, and also means that any regex that use this one inside are more legible. Notice the use of $days_RE and $months_RE inside that one. (Note on variable names: I don't normally use Hungarian Notation, but for some reason I tend to label my regex objects with a _RE suffix.)

That's not really very dynamic though. But where did $days_RE and $months_RE come from?

    my $days_RE = make_re( qw(
        Monday Tuesday Wednesday Thursday Friday Saturday Sunday
    ));
    my $months_RE = make_re( qw(
        January February March April May June July
        August September October November December
    ));

What's make_re? A simple function that makes a regex that matches any of the given elements of the array it is passed.

sub make_re { qr/(?:@{[ join( '|', @_ ) ]})/ }

At which point you think "Damn, Perl looks ugly!". Mind you, I'd like to see the code for that in other languages. The most important thing is that it's hidden away in a (badly named) subroutine so the actual code you'd be concentrating on is more legible. And, in turn, regex that use these variables are more legible and involve less repetition. This can only be a good thing.

make_re doesn't create paricularly efficient regex but, in this case, that doesn't really matter. If you want more efficient ones, Jarkko Hietaniemi wrote Regex::PreSuf: a nifty module that creates optimised regex to do this sort of matching. It would just be overkill for this particular program. If we suddenly get a swag of new months or days where there is more overlap in the leading characters, then it would make sense to switch to using this module.

For those wondering, this is the program I was thinking of when making those regex. They're reasonably strict regex so that if the page changes, I find out rather than have the program start guessing its way around.

Sponsor
Published Thursday, December 18, 2003 7:45 AM by spoon
Filed under:

Comments

 

spoon said:

March 5, 2005 6:55 PM
Anonymous comments are disabled

This Blog

Syndication

Tags

No tags have been created or used yet.