Check out XRegExp now, or track the latest development progress and contribute on GitHub.
Here's XRegExp's abbreviated feature list from the brand new xregexp.com (which includes extensive documentation and code examples):
The full list of changes can be seen in the changelog.
- New: Regular Expressions Cookbook
Python, Ruby, and VB.NET), but it's also useful for non-programmers. The majority of the content covers regular expressions more generally, and most of the regexes will work fine in your favorite regular-expression-enabled text editor or other tool. The book is targeted at people with regex skills
from beginner to upper intermediate, and there's a fair amount of
information in there even for people who already consider themselves
regex experts. Here is O'Reilly's press release for the book.
Don't forget to pick up a copy of your very own.
Here's the feature list:
- Added regex syntax:
- Comprehensive named capture support.
- Comment patterns:
- Added regex modifiers (flags):
s (singleline), to make dot match all characters including newlines.
x (extended), for free-spacing and comments.
- Added awesome:
- Reduced cross-browser inconsistencies.
- Recursive-construct parser with regex delimiters.
- An easy way to cache and reuse regex objects.
- The ability to safely embed literal text in your regex patterns.
- A method to add modifiers to existing regex objects.
apply methods, which make generically working with functions and regexes easier.
- RegexPal: Web-Based Regex Testing Reinvented
At the moment, RegexPal is fairly short on features, but here are the highlights:
- Real-time regex syntax highlighting with backwards and forwards context awareness.
- Lightning-fast match highlighting with alternating styles.
- Inverted matches (match any text not matched by the regex).
A few things to be aware of:
- The approach I've used for scrollable rich-text editing (which I haven't seen elsewhere on the web) is a bit buggy (but it's fast). Firefox and IE7 have the least issues, but it more or less works in other browsers as well.
- With the syntax highlighting, I generally mark corner-case issues which create cross-browser inconsistencies as errors even if they are the result of browser bugs or missing behavior documentation in ECMA-262v3.
- There are different forms of lines breaks cross-platform/browser. E.g., Firefox uses
\n even on Windows where nearly all programs use
\r\n. This can affect the results of certain regexes.
RegexPal, at least for me, is lots of fun to play with and helps to make learning regular expressions easy through its instant feedback. Check it out at regexpal.com.
- Mimicking Conditionals
Excited by the fact that I can mimic atomic groups when using most regex libraries which don't support them, I set my sights on another of my most wanted features which is commonly lacking: conditionals (which provide an if-then-else construct). Of the regex libraries I'm familiar with, conditionals are only supported by .NET, Perl, PCRE (and hence, PHP's preg functions), and JGSoft products (including RegexBuddy).
There are two common types of regex conditionals in those libraries: lookaround-based and capturing-group-based. I'll get to the former type in a bit, but first, I'll address capturing-group-based conditionals, which are able to base logic on whether an optional capturing group has participated in the match so far. Here's an example:
That matches only "bd" and "abc". The pattern can be expressed as follows:
Here's a comparable pattern I created which doesn't require support for conditionals:
Note that to use it without an "else" part, you still need to include the second empty backreference (in this case, "
\3") at the end, like this:
As a brief explanation of how that works, there's an empty alternation option within the lookahead at the beginning which is used to cancel the effect of the lookahead, while at the same time, the intentionally empty capturing groups within the alternation are exploited to base the then/else part on which option in the lookahead matched. However, there are a couple issues:
- This doesn't work with some regex engines, due to how they handle backreferences for non-participating capturing groups. For example, this does not work in Firefox, which treats non-participating capturing groups as if they matched an empty string.
- It interacts with backtracking differently than a real conditonal (the "a" part is treated as if it were within an optional, atomic group, e.g.,
(?>(a))? instead of
(a)?), so it's best to think of this as a new operator which is similar to a conditional.
As for lookaround-based conditionals, we can mimic them using the same concepts. Here's what real lookaround-based conditionals look like (this example uses a positive lookahead for the assertion):
And here's how you can mimic it:
Again, to use it without an "else" part, you still need to include the second empty backreference (in this case, "
\2") at the end, like this:
- Backtracking does not come into play with lookaround-based conditionals in the same way as with capturing-group-based conditionals. As a result, mimicked lookaround-based conditionals are functionally identical to their "real" counterparts.
(?:(?=if_assertion()|())\1then|\2else) is functionally equivalent to
For a compatibility table detailing support for these constructs with all the regex engines I've tested them with, see StevenLevithan.com: Mimicking Regular Expression Conditionals.
which I’ve called XRegExp. This script is very small (the minified version weighs in at 937 bytes), and it adds support for two simple but powerful additional flags
- s – Dot matches all (a.k.a.,
- x – Free-spacing and comments mode.
It also allows you to use these
flags with the RegExp constructor itself after running one line of code.
Additionally, XRegExp improves some minor cross-browser regex syntax consistency
Regexes built using XRegExp are
identical in speed to those built using the native RegExp constructor, support