Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Regualr expression for parsing String

Last post 03-27-2012, 9:03 PM by Aussie Susan. 1 replies.
Sort Posts: Previous Next
  •  03-27-2012, 2:12 PM 84844

    Regualr expression for parsing String

    Hello all,

    I am new to regular expressions.

    I want a regular expression which will allow only 10 characters in string.

    It will not allow special characters (like comma, hyphen, asterisk, whitespace,....), numerals

    Thanks in advance.

  •  03-27-2012, 9:03 PM 84845 in reply to 84844

    Re: Regualr expression for parsing String

    Depending on the regex variant you are using (they are not al the same and some don't offer the features I'm talking about here), there are 2 basic approaches you can take.

    The first is to list all of the characters that you DO want in a character set definition and use a quantifier as in:

    ^[a-z$]{10}$

    which will match exactly 10 characters that are made up of alphabetic characters and the dollar sign. You can add in whatever other characters you want to allow. However, if you are not careful (or you can't specify character ranges), then this can lead to a very large character set definition.

    As you have asked your question in the "Performance" forum, I should point out that the size of the character set definition is not (normally) an issue at run time. The first time the regex parser sees the pattern, it will construct a bit-map of all of the characters in the set; at runtime all it does is convert the text character to a bit setting and then test if it is set in the set map which can happen quite quickly.

    If you are excluding only a few characters (in comparison to those that you are allowing), then it might be easier to write and maintain something like:

    ^[^-,*\d\s]{10}$

    An alternative might also be:

    ^(?=((?![-,*\d\s]).)*$).{10}$

    What this does is to first check to see if there are ANY of the invalid characters (again as defined within the character set definition] and this will reject the match if there are. Once we know that all characters are valid (i.e. none are invalid) then we can simply check that there are exactly 10 characters in the string.

    Performance-wise, this will possibly take a bit longer as the text string is parsed twice, but this can also be extended to situations where you not only want to disallow some characters but also require that there are at least 1 (or some other number) of characters from a different set - this typically comes up in password validations where you need at least 1 alpha, one numeric and 1 special character etc. but not (say) whitespace. However the performance hit is probably not noticeable in most environments unless this check is being performed 1,000's of times a second

    Susan

View as RSS news feed in XML