Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

javascript proper case for special case names

Last post 02-04-2010, 3:27 PM by mash. 8 replies.
Sort Posts: Previous Next
  •  01-30-2010, 1:39 PM 59120

    javascript proper case for special case names

    I am working on a function in javascript that will turn a string into proper case. I found a working example like this:

    function toProperCase(s){  return s.toLowerCase().replace(/^(.)|\s(.)/g,           function($1) { return $1.toUpperCase(); }); } 

    and another one that does the same thing like this:

    function toProperCase(s){  return s.toLowerCase().replace(/\b(\w)/g,           function($1) { return $1.toUpperCase(); });}  

    I would like to extend the functionality to special case names like those that begin with "Mc" or "Mac",or that have an apostrophe in them, like my own "D'Orazio".

    I can't quite figure out how to use non-capturing groups though. This was my go at it:

    function toProperCase(s){  return s.toLowerCase().replace(/\b(\w)|(?:Mc)(.)/g,           function($1) { return $1.toUpperCase(); }); }

    From what I understand "(?: )" is supposed to mean "match but do not capture this group",so "(?:Mc)(.)" should match "Mcd" in "Mcdonald" but should only capture and therefore capitalize the "d".Instead the script continues to capitalize "MCD".

    How should this be constructed? 



    "Everything should be made as simple as possible, but not simpler."
    (Albert Einstein)
  •  01-31-2010, 4:16 PM 59130 in reply to 59120

    Re: proper case

    Your understanding of non-capturing groups is fine your application is the problem. Because that part of your expression will never match. If your string is "Mcdonald" the \w part of your patter will always match "M" satisfying the pattern.  Even if the second pattern in your alternation was every applied your replace would be wrong. However many groups are in your pattern they all are returned whether they matched or not. http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx

    Given you task and language you could do this instead.

    function getUpperCase(s, g1, g2){
        return g1 + g2.toUpperCase();
    }
    function toProperCase(s){
        return s.replace(/^(Ma?c)?(\w)/,getUpperCase);
    }
    var string = "Mcdonald";
    toProperCase(string);
     


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  02-03-2010, 12:22 PM 59241 in reply to 59130

    Re: SOLUTION javascript function toProperCase() for special case names

    Your example didn't seem to work, I kept fiddling around and came up with this: 
     
    function toProperCase(s)
    {
      return s.toLowerCase().replace( /\b((m)(a?c))?(\w)/g,
              function($1, $2, $3, $4, $5) { if($2){return $3.toUpperCase()+$4+$5.toUpperCase();} return $1.toUpperCase(); });
    }
     
    I had no luck in non-capturing groups. I tried alerting the groups to see the results,
    and even when I put "?:" in the parentheses the group was still alerted.
    Maybe javascript has a special handling of non-capturing groups.
    Anyways the above function works quite well, even with names with an apostrophe (O'Brien, O'Reilly, D'Orazio...),
    because whatever is after the apostrophe is treated as a new word.

    "Everything should be made as simple as possible, but not simpler."
    (Albert Einstein)
  •  02-03-2010, 1:16 PM 59243 in reply to 59241

    Re: SOLUTION javascript function toProperCase() for special case names

    No there is no special handing of non-captured groups in JavaScript.  I am getting the impression that you are confusing non-captured with not included. Non-captured simply means it won't be a separate value of the groups collection, it does not mean it wasn't part of the match, or part of another capture group.


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  02-03-2010, 4:40 PM 59245 in reply to 59243

    Re: SOLUTION javascript function toProperCase() for special case names

    I'm afraid I do not understand... In my understanding I translate the following:

    "/b(?:ma?c)?(/w)"

    as: "if you find 'mc' or 'mac' at the beginning of a word followed by another letter within that word, then don't remember  'mc' or 'mac' but remember only that following letter"

    so that if I begin working with group callbacks in javascript (as in the functions mentioned in the above posts), my group $1 is going to be "d" in the case of "Mcdonald" while "mc" will not be in any group. Actually the "m" of "mcdonald" would be in group $1, being at the beginning of the word it's going to be the first captured match, so I'm not sure what "d" would become, perhaps group $2?. But what is happening is that "mc" and "mac" are still being recognized as group $2 or whatever. Shouldn't they not be in the javascript group variables if not-captured?

    But in any case is there any way of replacing the "d" without replacing the "mc" or "mac" using non-capturing groups in a javascript replace function? Can my working function be re-written using non-capturing groups?


    "Everything should be made as simple as possible, but not simpler."
    (Albert Einstein)
  •  02-03-2010, 6:08 PM 59248 in reply to 59245

    Re: SOLUTION javascript function toProperCase() for special case names

    What Michael is alluding to is that, especially for replacements, you need to account for EVERY character from the first one that takes part in the overall match to the last.

    Also, you need to understand that the regex engine will assume that there are a set of parentheses around the whole pattern which it refers to as Match Group #0. If you were to simply try to use the pattern:

    \b(?:ma?c)?(\w)

    on the text lines:

    mcdonald
    macdonald
    wombat

    you would get a match of "mcd", "macd" and "w" from the regex engine for each line. You would ALSO get "d", "d" and "w" in match group #1 for each match. (There is no match group #2 in this pattern).

    The key thing to understand is that a non-capturing group will not create a separately numbered (or named) match group, but the characters WILL be included in the overall match group #0 text. You cannot "skip over" characters in this group.

    The problem comes when you try to use the regex "replace" function. What it will do is to remove all of the matched characters (i.e. those n match group #0) and then use your  replacement string to build up whatever should go in its place. Therefore if you simply tried to use the '$1' replacement string on the above text you would get

    donald
    donald
    wombat

    as the "mcd", "macd" and "w" have been replaced by the "d", "d" and "w" respectively. The remaining characters have not been touched at all.

    Therefore, if you want to do something with the "d" but leave the preceding characters (i.e. the "mc", "mac" or null string in each of the above examples) alone, you will need to account for them in the replacement string. In order to be able to refer to them, you must have captured the matching text. Therefore to (say) put a "%" before the first letter (ignoring the "mc" or "mac" parts) then you would need a pattern of

    \b(ma?c)?(\w)

    and a replacement string of

    $1%$2.

    The pattern basically works the same way as before but this time we need to capture the "mc" or "mac" text if it is there so we can put it back into the replacement string. If there is anything matched, it will go into match group #1. The next letter will go into match group $2.

    The replacement string is built up from whatever text was captured by match group #1, the "%" character and the text captured by match group #2. If the string was "wombat", then match group #1 will contain a null string which is not a problem in this case - the end result will simply be that the replacement string is "%w".Remembering that the replacement function will remove ALL matched characters (i.e. match group #0), then it will delete the "w" from "wombat" ands replace it with "%w" resulting in "%wombat"

    If the original string was "mcdonald", then match group #1 will be "mc" so the replacement string is "mc%d". Again, the regex engine will remove all matched characters - the "mcd" in this case, and replace it with the built-up string to give "mc%donald".

    Susan

  •  02-04-2010, 2:48 AM 59251 in reply to 59248

    Re: SOLUTION javascript function toProperCase() for special case names

    Hopefully Susan's explanation made my statement clearer.  The use of non-capture groups is this particular instance is of no benefit.  The original sample code which I provide, which does work as is, shows how you'd use capture groups to handle the pieces of what was matched.Match=group 0="Mcd", g1=group 1="Mc", g2 = group 2="d". I should also point out the the sample code is just that, a hardcoded sample not an all case solution. I only attempted to address the one example you had mentioned.  The problem you are going to have with dealing with Human names is that there are no hard and fast rules to follow. You mentioned names like Mac and Mc but what about "de" and "del". Even with Mac a name like "Macon" (I looked up on in the local phone book) is it suppose to be MacOn or Macon? Creating a perfect all case regex solution is likely not probable

    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  02-04-2010, 12:31 PM 59288 in reply to 59251

    Re: SOLUTION javascript function toProperCase() for special case names

    Ok I believe I have a little bit of a better understanding of non-capturing groups... Basically the way I constructed it is fine then and there is no need for non-capturing groups. And you are right that the script doesn't yet account for names with "de" or "la" or "von" or "van", or other cases like "Macon". It's a first attempt at getting a special case names ProperCase function seeing that no one has really done one yet (at least a simple one). At least this starts off pretty simple, it's only a few lines of script. 

    (mash says that his script does work... well for example when you call your first function from the second one you don't put parentheses after it, so the script sees it as an undefined variable... And anyway it seems to me that it's simpler to have everything in one script than in two...)

    In any way here is a link with an example of the script in action, it can sure be bettered for other special cases; for example I would now like to know how to tell regex to ignore words such as "de" or "la". I suppose you could always do a double pass script with one regex being called after another, but it wouldn't be a bad if it could all be done in one... 

    http://johnrdorazio.altervista.org/SitoFlatnukePersonale/index.php?mod=javascript_topropercase&lang=en 


    "Everything should be made as simple as possible, but not simpler."
    (Albert Einstein)
  •  02-04-2010, 3:27 PM 59311 in reply to 59288

    Re: SOLUTION javascript function toProperCase() for special case names

    Lwangaman:
    (mash says that his script does work... well for example when you call your first function from the second one you don't put parentheses after it, so the script sees it as an undefined variable... And anyway it seems to me that it's simpler to have everything in one script than in two...)

     No there are not suppose to be parenthesis after the function name in the replace parameter. Of course I can't see how you have implemented the sample differently. You may have introduced a typo in your testing.If are getting an "undefined variable then you have changed something. If you stick the following into the body element of a web page you should get an alert of "McDonald" when viewing the page in a web browser.

     <script type="text/javascript">
                function getUpperCase(s, g1, g2){
                    return g1 + g2.toUpperCase();
                }
               
                function toProperCase(s){
                    return s.replace(/^(Ma?c)?(\w)/, getUpperCase);
                }
               
                var string = "Mcdonald";
               alert(toProperCase(string));
            </script>


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
View as RSS news feed in XML