Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Capitalise each word and stip non-alphanumeric characters

Last post 07-04-2006, 11:34 PM by Xicheng Jia. 3 replies.
Sort Posts: Previous Next
  •  07-04-2006, 7:47 AM 19266

    Capitalise each word and stip non-alphanumeric characters

    Hi,

    I need a regular expression that will capitalise each word, strip any characters that are not alphanumeric and then strip any spaces.

    For example, "Smith & Jones are #1" would become "SmithJonesAre1".

    I'm working with this in C#.

    Thanks in advance.

  •  07-04-2006, 12:32 PM 19270 in reply to 19266

    Re: Capitalise each word and stip non-alphanumeric characters

    //Start of algorithm

    1. match all words startnig with an alphanumeric char in lower case, like *are*, using Regex.Replace method.

    2. pass the match (word) to a delegate (use MatchEvaluator) function

    3 the delegate (MatchEvaluator) function replaces 1st (lower case ) char with an upper case char.

    4. MatchEvaluator returns the altered word back to Regex.Replace method:

    [as a result orig string "Smith & Jones are #1"  becomes "Smith & Jones Are #1" ]

    5. use Regex.Replace second time: replace all occurrences of whilte space/OR non-alphanumeric char with empty string:

    [as a result current string "Smith & Jones Are #1"  becomes "SmithJonesAre1" ]

    //End of algorithm

    //now implement this using C# Regex Object...

  •  07-04-2006, 8:02 PM 19280 in reply to 19266

    Re: Capitalise each word and stip non-alphanumeric characters

    Hi,

    I don't know if C# supports \U \L \u, and \l notions which can change cases of the following letters, i.e. "\Usmith" will print "SMITH", and "\usmith" print "Smith". 

    in Perl, your problem can be done in a simple s/// expression:

        s/(\w)(\w*)|./\u$1\L$2/g

    If you want to also remove the underscore '_', change \w to [0-9a-zA-Z]

    under C#, even if you can not do \u, \L stuff, I guess you might still use a similar pattern:

        (\w)(\w*)|.

    and then write a function which accepts $1, $2 as its arguments, and change $1 to uppercase, $2 to lower case.

    Good luck,
    Xicheng


    perl -le 'print"So~*kde~box*DS*Zoxf*fe|er"^$\x23'
  •  07-04-2006, 11:34 PM 19284 in reply to 19280

    Re: Capitalise each word and stip non-alphanumeric characters

    BTW. it might be better to change pattern/replacement to:

    pattern: ([0-9]*)([a-zA-Z]?)([a-zA-Z]*)|.
    replacement: $1\u$2\L$3

    which can change:  $%^123SMIth##@@ to 123Smith instead of 123smith

    ----

    BTW. just found a very interesting behavier about the \u \L stuff under Perl.

    the pattern can become: ([0-9]*)([a-zA-Z]*)|.
    the replacement: $1\u\L$2

    \u\L can Capitalize the first letter and meanwhile lowercase the following letters..Dont know if or not the other programming languges can do this though. :-)

    Xicheng


    perl -le 'print"So~*kde~box*DS*Zoxf*fe|er"^$\x23'
View as RSS news feed in XML