Mimicking Conditionals

Excited by the fact that I can mimic atomic groups when using most regex libraries which don't support them, I set my sights on another of my most wanted features which is commonly lacking: conditionals (which provide an if-then-else construct). Of the regex libraries I'm familiar with, conditionals are only supported by .NET, Perl, PCRE (and hence, PHP's preg functions), and JGSoft products (including RegexBuddy).

There are two common types of regex conditionals in those libraries: lookaround-based and capturing-group-based. I'll get to the former type in a bit, but first, I'll address capturing-group-based conditionals, which are able to base logic on whether an optional capturing group has participated in the match so far. Here's an example:

(a)?b(?(1)c|d)

That matches only "bd" and "abc". The pattern can be expressed as follows:

(if_matched)?inner_pattern(?(1)then|else)

Here's a comparable pattern I created which doesn't require support for conditionals:

(?=(a)()|())\1?b(?:\2c|\3d)

Note that to use it without an "else" part, you still need to include the second empty backreference (in this case, "\3") at the end, like this:

(?=(a)()|())\1?b(?:\2c|\3)

As a brief explanation of how that works, there's an empty alternation option within the lookahead at the beginning which is used to cancel the effect of the lookahead, while at the same time, the intentionally empty capturing groups within the alternation are exploited to base the then/else part on which option in the lookahead matched. However, there are a couple issues:

  • This doesn't work with some regex engines, due to how they handle backreferences for non-participating capturing groups. For example, this does not work in Firefox, which treats non-participating capturing groups as if they matched an empty string.
  • It interacts with backtracking differently than a real conditonal (the "a" part is treated as if it were within an optional, atomic group, e.g., (?>(a))? instead of (a)?), so it's best to think of this as a new operator which is similar to a conditional.

As for lookaround-based conditionals, we can mimic them using the same concepts. Here's what real lookaround-based conditionals look like (this example uses a positive lookahead for the assertion):

(?(?=if_assertion)then|else)

And here's how you can mimic it:

(?:(?=if_assertion()|())\1then|\2else)

Again, to use it without an "else" part, you still need to include the second empty backreference (in this case, "\2") at the end, like this:

(?:(?=if_assertion()|())\1then|\2)

Notes:

  • Backtracking does not come into play with lookaround-based conditionals in the same way as with capturing-group-based conditionals. As a result, mimicked lookaround-based conditionals are functionally identical to their "real" counterparts.
  • (?:(?=if_assertion()|())\1then|\2else) is functionally equivalent to (?=if_assertion()|())(?:\1then|\2else)

For a compatibility table detailing support for these constructs with all the regex engines I've tested them with, see StevenLevithan.com: Mimicking Regular Expression Conditionals.

Sponsor
Published 01 June 07 07:00 by Stevezilla00
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(optional)
(required) 
Enter the code you see below