Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Exclude norwegian characters and space

Last post 05-07-2009, 6:00 AM by Aussie Susan. 3 replies.
Sort Posts: Previous Next
  •  05-05-2009, 3:35 AM 52658

    Exclude norwegian characters and space

    I need a regex that matches a string containing only the letters of the english alphabet, upper and lower case, all digits and underscore.

    I'm trying to validate a input form with Dreamweaver, programming in ASP.

    I have tried to use the standard [a-zA-Z0-9_] regex. When I enter the character å (which is equivalent with the å) my regex works.

    But when I add one more character to my input, i.e. "år", then my regex doesn't work.

    I need my regex to accept inputs like these: "123_potet", "retur_45_a", "UNIFORM_79a"

    It should NOT accept inputs like these: "are you there", "hello-world", "årlig hendelse", "24 år", "ærlige_øivind"

    Please advice.

  •  05-05-2009, 7:19 AM 52664 in reply to 52658

    Re: Exclude norwegian characters and space

    Gunnar,

    I'm not sure what your actual problem is because when I try your regex (or what should be your regex to validate a whole string):

    ^[a-z0-9_]+$

    with the 'ignore case' option set, then it matches only those items you say it should and does not match those you say it shouldn't.

    I could understand the issue if you were using '^\w+$' and the version of the regex you were using was compiled for multi-national/unicode characters, but the explicit use of the 'a-z' in the character set definition should not include any other languages' alphabetic characters.

    If the pattern you presented is the entire regex and you are expecting it to validate an entire string, then this may the the cause of your problem. As written, your pattern will match ANY alphanumeric or underscore character anywhere in the input string. The normal action of a regex without any anchors (in this case the '^' and '$' operators in my pattern) is to start at the beginning of the text and try to make a match. If it can't, then it moves forward 1 character and tries again. It keeps doing this until it either gets to the end of the string or finds a match.

    Your pattern, as written, will test a single character. Therefore, using the procedure described above, it will start at the first character - say the "å" in your example - and fail to match. It will then advance to the next character - the "r" in your example - and match. Therefore it will return a successful match.

    My version forces the match to always start at the beginning of the string (the '^') and repeatedly test each character in turn (by the way the '+' quantifier makes sure that there is at least 1 character in the text; the '*' quantifier would allow a match on a blank line) until it reaches the end of the text (the '$').

    I hope I have guessed you problem and not stated what you already know.

    Susan 

  •  05-07-2009, 5:38 AM 52750 in reply to 52664

    Re: Exclude norwegian characters and space

    Hi there.

    After looking more closely on my regex, I have trimmed it down to your suggestion and now everything works like a charm.

    Thanks for the help :-)

  •  05-07-2009, 6:00 AM 52751 in reply to 52750

    Re: Exclude norwegian characters and space

    Velkommen

    Susan 

View as RSS news feed in XML